如何通过PypeR来实现在Python中调用R

Python012

如何通过PypeR来实现在Python中调用R,第1张

如何通过PypeR来实现在Python中调用R

In [1]: # LOAD PYTHON PACKAGES

In [2]: import pandas as pd

In [3]: import pyper as pr

In [4]: # READ DATA

In [5]: data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt", header = 0)

In [6]: # CREATE A R INSTANCE WITH PYPER

In [7]: r = pr.R(use_pandas = True)

In [8]: # PASS DATA FROM PYTHON TO R

In [9]: r.assign("rdata", data)

In [10]: # SHOW DATA SUMMARY

In [11]: print r("summary(rdata)")

try({summary(rdata)})

LEV_LT3 TAX_NDEB COLLAT1 SIZE1

Min. :0.00000 Min. : 0.0000 Min. :0.0000 Min. : 7.738

1st Qu.:0.00000 1st Qu.: 0.3494 1st Qu.:0.1241 1st Qu.:12.317

Median :0.00000 Median : 0.5666 Median :0.2876 Median :13.540

Mean :0.09083 Mean : 0.8245 Mean :0.3174 Mean :13.511

3rd Qu.:0.01169 3rd Qu.: 0.7891 3rd Qu.:0.4724 3rd Qu.:14.751

Max. :0.99837 Max. :102.1495 Max. :0.9953 Max. :18.587

PROF2 GROWTH2 AGE LIQ

Min. :0.0000158 Min. :-81.248 Min. : 6.00 Min. :0.00000

1st Qu.:0.0721233 1st Qu.: -3.563 1st Qu.: 11.00 1st Qu.:0.03483

Median :0.1203435 Median : 6.164 Median : 17.00 Median :0.10854

Mean :0.1445929 Mean : 13.620 Mean : 20.37 Mean :0.20281

3rd Qu.:0.1875148 3rd Qu.: 21.952 3rd Qu.: 25.00 3rd Qu.:0.29137

Max. :1.5902009 Max. :681.354 Max. :210.00 Max. :1.00018

IND2AIND3AIND4A IND5A

Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000

1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000

Median :1.0000 Median :0.0000 Median :0.00000 Median :0.00000

Mean :0.6116 Mean :0.1902 Mean :0.02692 Mean :0.09907

3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000

Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000

In [12]: # LOAD R PACKAGE

In [13]: r("library(betareg)")

Out[13]: 'try({library(betareg)})\nLoading required package: Formula\n'

In [14]: # ESTIMATE A BETA REGRESSION

In [15]: r("m <- betareg(LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, subset = LEV_LT3 >0)")

Out[15]: 'try({m <- betareg(LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, subset = LEV_LT3 >0)})\n'

In [16]: # OUTPUT MODEL SUMMARY

In [17]: print r("summary(m)")

try({summary(m)})

Call:

betareg(formula = LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata,

subset = LEV_LT3 >0)

Standardized weighted residuals 2:

Min 1Q Median 3Q Max

-7.2802 -0.5194 0.0777 0.6037 5.8777

Coefficients (mean model with logit link):

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.229773 0.312990 3.929 8.53e-05 ***

SIZE1 -0.105009 0.021211 -4.951 7.39e-07 ***

PROF2 -2.414794 0.377271 -6.401 1.55e-10 ***

GROWTH2 0.003306 0.001043 3.169 0.00153 **

AGE -0.004999 0.001795 -2.786 0.00534 **

IND3A0.688314 0.074069 9.293 <2e-16 ***

Phi coefficients (precision model with identity link):

Estimate Std. Error z value Pr(>|z|)

(phi) 3.9362 0.1528 25.77 <2e-16 ***

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type of estimator: ML (maximum likelihood)

Log-likelihood: 266.7 on 7 Df

Pseudo R-squared: 0.1468

Number of iterations: 25 (BFGS) + 2 (Fisher scoring)

In [18]: # CALCULATE MODEL PREDICTION

In [19]: r("beta_fit <- predict(m, link = 'response')")

Out[19]: "try({beta_fit <- predict(m, link = 'response')})\n"

In [20]: # SHOW PREDICTION SUMMARY IN R

In [21]: print r("summary(beta_fit)")

try({summary(beta_fit)})

Min. 1st Qu. MedianMean 3rd Qu.Max.

0.1634 0.3069 0.3465 0.3657 0.4007 0.6695

In [22]: # PASS DATA FROM R TO PYTHON

In [23]: pydata = pd.DataFrame(r.get("beta_fit"), columns = ["y_hat"])

In [24]: # SHOW PREDICTION SUMMARY IN PYTHON

In [25]: pydata.y_hat.describe()

Out[25]:

count1116.000000

mean0.365675

std 0.089804

min 0.163388

25% 0.306897

50% 0.346483

75% 0.400656

max 0.669489

在R语言中找到了计算遗传距离的函数dist.dna()但是不知道在R里面如何利用循环批量处理文件计算遗传距离。想到了利用python来调用R函数的方法,查找相关教程发现需要用到rpy2模块。

easy_install rpy2 报错(看不懂报错内容);

pip install rpy2 报错(提示需要更新pip到pip19.0.3);

利用 python -m pip install --upgrade pip 更新pip报错(看不懂报错内容);

利用 https://pip.pypa.io/en/stable/installing/ 教程安装pip成功更新。

使用 pip install rpy2 安装依旧报错(看不懂报错内容);

尝试教程 https://blog.csdn.net/suzyu12345/article/details/51476321 安装rpy2,提示 rpy2-2.9.5-cp37-cp37m-win_amd64.whl is not a supported wheel on this platform

在rpy2主页 https://rpy2.bitbucket.io/ 发现一句话 Releasend source packages are available on PyPi. Installing should be as easy * as

(*:except on Windows)

这意思是在windows系统使用pip安装不太容易吗?

找到了教程 rpy2:在python中调用R函数的一个实例 ;发现其中rpy2的安装使用的是conda,自己也尝试在windows的DOS窗口下使用 conda install rpy2 成功。但是结尾处提示了一句 此时不应有do 不明白是什么意思

在python中加载R包查到可以使用

加载R语言自带的包时没有遇到问题;但是加载需要额外安装的包时遇到了报错

按照教程 https://blog.csdn.net/arcers/article/details/79109535 使用 conda install -c r r-ggplot2 安装需要用到的包解决问题

一个简便描述序列分歧大小的测度是两条核苷酸序列中不同核苷酸位点的比例 P = nd/n;

nd为检测两条序列间不同核苷酸数;n为配对总数;P成为核苷酸间的p距离