R语言:DNA序列比对后计算遗传距离(P-distance)

Python018

R语言:DNA序列比对后计算遗传距离(P-distance),第1张

在R语言中找到了计算遗传距离的函数dist.dna()但是不知道在R里面如何利用循环批量处理文件计算遗传距离。想到了利用python来调用R函数的方法,查找相关教程发现需要用到rpy2模块。

easy_install rpy2 报错(看不懂报错内容);

pip install rpy2 报错(提示需要更新pip到pip19.0.3);

利用 python -m pip install --upgrade pip 更新pip报错(看不懂报错内容);

利用 https://pip.pypa.io/en/stable/installing/ 教程安装pip成功更新。

使用 pip install rpy2 安装依旧报错(看不懂报错内容);

尝试教程 https://blog.csdn.net/suzyu12345/article/details/51476321 安装rpy2,提示 rpy2-2.9.5-cp37-cp37m-win_amd64.whl is not a supported wheel on this platform

在rpy2主页 https://rpy2.bitbucket.io/ 发现一句话 Releasend source packages are available on PyPi. Installing should be as easy * as

(*:except on Windows)

这意思是在windows系统使用pip安装不太容易吗?

找到了教程 rpy2:在python中调用R函数的一个实例 ;发现其中rpy2的安装使用的是conda,自己也尝试在windows的DOS窗口下使用 conda install rpy2 成功。但是结尾处提示了一句 此时不应有do 不明白是什么意思

在python中加载R包查到可以使用

加载R语言自带的包时没有遇到问题;但是加载需要额外安装的包时遇到了报错

按照教程 https://blog.csdn.net/arcers/article/details/79109535 使用 conda install -c r r-ggplot2 安装需要用到的包解决问题

一个简便描述序列分歧大小的测度是两条核苷酸序列中不同核苷酸位点的比例 P = nd/n;

nd为检测两条序列间不同核苷酸数;n为配对总数;P成为核苷酸间的p距离

以下为该包的帮助文件内容

Dynamic nomogram to visualise statistical models

Description

DynNom is a generic function to display the results of statistical model objects as a dynamic nomogram in an 'RStudio' panel or web browser. DynNom supports a large number of model objects from a variety of packages.

Usage

DynNom(model, data = NULL, clevel = 0.95, m.summary = c("raw", "formatted"),

covariate = c("slider", "numeric"), ptype = c("st", "1-st"),

DNtitle = NULL, DNxlab = NULL, DNylab = NULL, DNlimits = NULL,

KMtitle = NULL, KMxlab = NULL, KMylab = NULL)

DynNom.core(model, data, clevel, m.summary, covariate, DNtitle, DNxlab, DNylab, DNlimits)

DynNom.surv(model, data, clevel, m.summary, covariate,

ptype, DNtitle, DNxlab, DNylab, KMtitle, KMxlab, KMylab)

Arguments

model

an lm, glm, coxph, ols, Glm, lrm, cph, mgcv::gam or gam::gam model objects.

data

a dataframe of the accompanying dataset for the model (if required).

clevel

a confidence level for constructing the confidence interval. If not specified, a 95% level will be used.

m.summary

an option to choose the type of the model output represented in the 'Summary Model' tab. "raw" (the default) returns an unformatted summary of the model"formatted" returns a formatted table of the model summary using stargazer package.

covariate

an option to choose the type of input control widgets used for numeric values. "slider" (the default) picks out sliderInput"numeric" picks out numericInput.

ptype

an option for coxph or cph model objects to choose the type of plot which displays in "Survival plot" tab. "st" (the default) returns plot of estimated survivor probability (S(t)). "1-st" returns plot of estimated failure probability (1-S(t)).

DNtitle

a character vector used as the app's title. If not specified, "Dynamic Nomogram" will be used.

DNxlab

a character vector used as the title for the x-axis in "Graphical Summary" tab. If not specified, "Probability" will be used for logistic model and Cox proportional model objectsor "Response variable" for other model objects.

DNylab

a character vector used as the title for the y-axis in "Graphical Summary" tab (default is NULL).

DNlimits

a vector of 2 numeric values used to set x-axis limits in "Graphical Summary" tab. Note: This also removes the 'Set x-axis ranges' widget in the sidebar panel.

KMtitle

a character vector used as KM plot's title in "Survival plot" tab. If not specified, "Estimated Survival Probability" for ptype = "st" and "Estimated Probability" for ptype = "1-st" will be used.

KMxlab

a character vector used as the title for the x-axis in "Survival plot" tab. If not specified, "Follow Up Time" will be used.

KMylab

a character vector used as the title for the y-axis in "Survival plot" tab. If not specified, "S(t)" for ptype = "st" and "F(t)" for ptype = "1-st" will be used.

Value

A dynamic nomogram in a shiny application providing individual predictions which can be used as a model visualisation or decision-making tools.

The individual predictions with a relative confidence interval are calculated using the predict function, displaying either graphically as an interactive plot in the Graphical Summary tab or a table in the Numerical Summary tab. A table of model output is also available in the Model Summary tab. In the case of the Cox proportional hazards model, an estimated survivor/failure function will be additionally displayed in a new tab.

Please cite as:

Jalali, A., Roshan, D., Alvarez-Iglesias, A., Newell, J. (2019). Visualising statistical models using dynamic nomograms. R package version 5.0.

Author(s)

Amirhossein Jalali, Davood Roshan, Alberto Alvarez-Iglesias, John Newell

Maintainer: Amirhossein Jalali [email protected]

References

Banks, J. 2006. Nomograms. Encyclopedia of Statistical Sciences. 8.

Easy web applications in R. http://shiny.rstudio.com

Frank E Harrell Jr (2017). rms: Regression Modeling Strategies. R package version 4.5-0. https://CRAN.R-project.org/package=rms

See Also

DNbuilder, getpred.DN