怎么把表达值与ENSEMBL匹配用r语言

Python033

怎么把表达值与ENSEMBL匹配用r语言,第1张

R语言文本处理的重要一环,而R里面最强大的文本处理公式就是grep()一类的general函数(无需添加任何新的package)。

下面以人类基因为例:

library('biomaRt')

mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))

genes <- rownames(pbmc.markers)

G_list <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id","hgnc_symbol"),values=genes,mart= mart)

其中genes为一个向量,打印出来,如: "ENSG00000197579" "ENSG00000123096" "ENSG00000143815" "ENSG00000118257"

蛋白质登记号(protein accession number),形如NP_005537,用Gene ID conversion工具,还有R语言的clusterProfiler包和biomaRt包等等。

library(biomaRt)

options(stringsAsFactors=F)

human <- useMart("ensembl", dataset = "hsapiens_gene_ensembl")

mouse <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")

mouse.genes <- read.table('color.signatures.txt',sep='\n',header=F)

geneset <- strsplit(mouse.genes[3,],split='\t')[[1]]

genes.map = getLDS(mart=mouse, attributes=c("mgi_symbol"), filters="mgi_symbol", values=geneset[c(-1,-2)], attributesL=c("hgnc_symbol"), martL=human, uniqueRows=T)

write.table(genes.map,"mouse.to.human.genes.csv",row.names=F,col.names=T,quote=F)

library(org.Hs.eg.db)

library(clusterProfiler)

rm(list=ls())

geneset <- bitr(geneset, fromType="ENTREZID",toType=c( "SYMBOL"),OrgDb = org.Hs.eg.db)$SYMBOL