In [1]: # LOAD PYTHON PACKAGES
In [2]: import pandas as pd
In [3]: import pyper as pr
In [4]: # READ DATA
In [5]: data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt", header = 0)
In [6]: # CREATE A R INSTANCE WITH PYPER
In [7]: r = pr.R(use_pandas = True)
In [8]: # PASS DATA FROM PYTHON TO R
In [9]: r.assign("rdata", data)
In [10]: # SHOW DATA SUMMARY
In [11]: print r("summary(rdata)")
try({summary(rdata)})
LEV_LT3 TAX_NDEB COLLAT1 SIZE1
Min. :0.00000 Min. : 0.0000 Min. :0.0000 Min. : 7.738
1st Qu.:0.00000 1st Qu.: 0.3494 1st Qu.:0.1241 1st Qu.:12.317
Median :0.00000 Median : 0.5666 Median :0.2876 Median :13.540
Mean :0.09083 Mean : 0.8245 Mean :0.3174 Mean :13.511
3rd Qu.:0.01169 3rd Qu.: 0.7891 3rd Qu.:0.4724 3rd Qu.:14.751
Max. :0.99837 Max. :102.1495 Max. :0.9953 Max. :18.587
PROF2 GROWTH2 AGE LIQ
Min. :0.0000158 Min. :-81.248 Min. : 6.00 Min. :0.00000
1st Qu.:0.0721233 1st Qu.: -3.563 1st Qu.: 11.00 1st Qu.:0.03483
Median :0.1203435 Median : 6.164 Median : 17.00 Median :0.10854
Mean :0.1445929 Mean : 13.620 Mean : 20.37 Mean :0.20281
3rd Qu.:0.1875148 3rd Qu.: 21.952 3rd Qu.: 25.00 3rd Qu.:0.29137
Max. :1.5902009 Max. :681.354 Max. :210.00 Max. :1.00018
IND2AIND3AIND4A IND5A
Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000
Median :1.0000 Median :0.0000 Median :0.00000 Median :0.00000
Mean :0.6116 Mean :0.1902 Mean :0.02692 Mean :0.09907
3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000
Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
In [12]: # LOAD R PACKAGE
In [13]: r("library(betareg)")
Out[13]: 'try({library(betareg)})\nLoading required package: Formula\n'
In [14]: # ESTIMATE A BETA REGRESSION
In [15]: r("m <- betareg(LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, subset = LEV_LT3 >0)")
Out[15]: 'try({m <- betareg(LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, subset = LEV_LT3 >0)})\n'
In [16]: # OUTPUT MODEL SUMMARY
In [17]: print r("summary(m)")
try({summary(m)})
Call:
betareg(formula = LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata,
subset = LEV_LT3 >0)
Standardized weighted residuals 2:
Min 1Q Median 3Q Max
-7.2802 -0.5194 0.0777 0.6037 5.8777
Coefficients (mean model with logit link):
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.229773 0.312990 3.929 8.53e-05 ***
SIZE1 -0.105009 0.021211 -4.951 7.39e-07 ***
PROF2 -2.414794 0.377271 -6.401 1.55e-10 ***
GROWTH2 0.003306 0.001043 3.169 0.00153 **
AGE -0.004999 0.001795 -2.786 0.00534 **
IND3A0.688314 0.074069 9.293 <2e-16 ***
Phi coefficients (precision model with identity link):
Estimate Std. Error z value Pr(>|z|)
(phi) 3.9362 0.1528 25.77 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Type of estimator: ML (maximum likelihood)
Log-likelihood: 266.7 on 7 Df
Pseudo R-squared: 0.1468
Number of iterations: 25 (BFGS) + 2 (Fisher scoring)
In [18]: # CALCULATE MODEL PREDICTION
In [19]: r("beta_fit <- predict(m, link = 'response')")
Out[19]: "try({beta_fit <- predict(m, link = 'response')})\n"
In [20]: # SHOW PREDICTION SUMMARY IN R
In [21]: print r("summary(beta_fit)")
try({summary(beta_fit)})
Min. 1st Qu. MedianMean 3rd Qu.Max.
0.1634 0.3069 0.3465 0.3657 0.4007 0.6695
In [22]: # PASS DATA FROM R TO PYTHON
In [23]: pydata = pd.DataFrame(r.get("beta_fit"), columns = ["y_hat"])
In [24]: # SHOW PREDICTION SUMMARY IN PYTHON
In [25]: pydata.y_hat.describe()
Out[25]:
count1116.000000
mean0.365675
std 0.089804
min 0.163388
25% 0.306897
50% 0.346483
75% 0.400656
max 0.669489
在R语言中找到了计算遗传距离的函数dist.dna()但是不知道在R里面如何利用循环批量处理文件计算遗传距离。想到了利用python来调用R函数的方法,查找相关教程发现需要用到rpy2模块。
easy_install rpy2 报错(看不懂报错内容);
pip install rpy2 报错(提示需要更新pip到pip19.0.3);
利用 python -m pip install --upgrade pip 更新pip报错(看不懂报错内容);
利用 https://pip.pypa.io/en/stable/installing/ 教程安装pip成功更新。
使用 pip install rpy2 安装依旧报错(看不懂报错内容);
尝试教程 https://blog.csdn.net/suzyu12345/article/details/51476321 安装rpy2,提示 rpy2-2.9.5-cp37-cp37m-win_amd64.whl is not a supported wheel on this platform
在rpy2主页 https://rpy2.bitbucket.io/ 发现一句话 Releasend source packages are available on PyPi. Installing should be as easy * as
(*:except on Windows)
这意思是在windows系统使用pip安装不太容易吗?
找到了教程 rpy2:在python中调用R函数的一个实例 ;发现其中rpy2的安装使用的是conda,自己也尝试在windows的DOS窗口下使用 conda install rpy2 成功。但是结尾处提示了一句 此时不应有do 不明白是什么意思
在python中加载R包查到可以使用
加载R语言自带的包时没有遇到问题;但是加载需要额外安装的包时遇到了报错
按照教程 https://blog.csdn.net/arcers/article/details/79109535 使用 conda install -c r r-ggplot2 安装需要用到的包解决问题
一个简便描述序列分歧大小的测度是两条核苷酸序列中不同核苷酸位点的比例 P = nd/n;
nd为检测两条序列间不同核苷酸数;n为配对总数;P成为核苷酸间的p距离