c=c()
for (i in seq(ncol(x))){
#if (length(which(complete.cases(x[[i]])))>1) #如何某列有空值
if(sum(complete.cases(x[i]==FALSE)>=1)
{c=c(i,c)}
}
x=x[-c]
}
由于数据可能在Windows下编辑过,保存的是UTF-16的格式用R读取可能会出现以下问题。这种情况有以下三种解决方案。>sampInfo=read.table("/media/xxx/sampInfo_origin.txt", na.strings=c("", "NA"), sep="\t", header=T)Error in make.names(col.names, unique = TRUE) : invalid multibyte string at '<ff><fe>R'In addition: Warning messages:1: In read.table("/media/xxx/sampInfo_origin.txt", : line 1 appears to contain embedded nulls2: In read.table("/media/xxx/sampInfo_origin.txt", : line 2 appears to contain embedded nulls3: In read.table("/media/xxx/sampInfo_origin.txt", : line 3 appears to contain embedded nulls4: In read.table("/media/xxx/sampInfo_origin.txt", : line 4 appears to contain embedded nulls5: In read.table("/media/albert/xxx/sampInfo_origin.txt", : line 5 appears to contain embedded nulls
解决方法一:fileEncoding="UTF16LE"或者fileEncoding="UTF16"
>sampInfo=read.table("/media/xxx/sampInfo_origin.txt", fileEncoding="UTF16LE", sep="\t", header=T)>sampInfo=read.table("/media/xxx/sampInfo_origin.txt", fileEncoding="UTF16", sep="\t", header=T)>head(sampInfo) Run Sample_Name age ancestry arthropathymeds biologics das_score1 SRRxxx72 GSMxxx25 66 <NA> <NA> <NA> NA2 SRRxxx73 GSMxxx26 72 <NA> <NA> <NA> NA3 SRRxxx75 GSMxxx28 61 <NA> <NA> <NA> NA4 SRRxxx74 GSMxxx27 72 <NA> <NA> <NA> NA5 SRRxxx76 GSMxxx29 50 <NA> <NA> <NA> NA6 SRRxxx77 GSMxxx30 59 <NA> <NA> <NA> NA disease_activity donor gender leflumide nsaids othermeds phenotype1 <NA> C137 male <NA> <NA> <NA> Healthy2 <NA> C141 male <NA> <NA> <NA> Healthy3 <NA> C383 male <NA> <NA> <NA> Healthy4 <NA> C148 female <NA> <NA> <NA> Healthy5 <NA> C391 female <NA> <NA> <NA> Healthy6 <NA> C392 female <NA> <NA> <NA> Healthy classification status plaquenil rituximab steroids sulfasalazine tissue1 H H <NA> <NA><NA> <NA> Blood2 H H <NA> <NA><NA> <NA> Blood3 H H <NA> <NA><NA> <NA> Blood4 H H <NA> <NA><NA> <NA> Blood5 H H <NA> <NA><NA> <NA> Blood6 H H <NA> <NA><NA> <NA> Blood
解决方法二:在Excel中打开,另存为csv文件即可。
>sampInfo=read.csv("/media/xxx/sampInfo_origin.csv", comment.char = "#", sep=",", header=T)>head(sampInfo) Run Sample_Name age ancestry arthropathymeds biologics das_score1 SRRxxx72 GSMxxx25 66 <NA> <NA> <NA> NA2 SRRxxx73 GSMxxx26 72 <NA> <NA> <NA> NA3 SRRxxx75 GSMxxx28 61 <NA> <NA> <NA> NA4 SRRxxx74 GSMxxx27 72 <NA> <NA> <NA> NA5 SRRxxx76 GSMxxx29 50 <NA> <NA> <NA> NA6 SRRxxx77 GSMxxx30 59 <NA> <NA> <NA> NA disease_activity donor gender leflumide nsaids othermeds phenotype1 <NA> C137 male <NA> <NA> <NA> Healthy2 <NA> C141 male <NA> <NA> <NA> Healthy3 <NA> C383 male <NA> <NA> <NA> Healthy4 <NA> C148 female <NA> <NA> <NA> Healthy5 <NA> C391 female <NA> <NA> <NA> Healthy6 <NA> C392 female <NA> <NA> <NA> Healthy classification status plaquenil rituximab steroids sulfasalazine tissue1 H H <NA> <NA><NA> <NA> Blood2 H H <NA> <NA><NA> <NA> Blood3 H H <NA> <NA><NA> <NA> Blood4 H H <NA> <NA><NA> <NA> Blood5 H H <NA> <NA><NA> <NA> Blood6 H H <NA> <NA><NA> <NA> Blood
解决方法三:在linux系统里将sampInfo_origin.txt用gedit打开,另存为sampInfo_origin01.txt,“Character Encoding” 改为 UTF-8, “Line ending”改为“Unix/Linux”。
>sampInfo=read.table("/media/xxx
应该是安装目录里有中文名称,还有install packages默认存放位置也必须无中文路径,有的电脑my documents为中文,如叫作我的资料库,就会报这个错误,把中文改成英文就可以了,切记最好不要有空格,不知道说明白没有,希望能对你有帮助