接下来,我们利用sqldf包来处理分组汇总的问题。
由于sqldf包不是R语言自带的,所以先用以下代码安装sqldf包:
install.packages("sqldf")
然后选择“China(Beijing)”镜像站点进行安装,R语言会同时自动安装“sqldf”包的依赖包。
安装好sqldf包及其依赖包后,输入以下代码加载sqldf包:
library(sqldf)
一切准备就绪,接下来用sqldf统计每个同学的总成绩和平均分:
sqldf("select name,sum(score) as score_sum,avg(score) as score_avg from Mydata group by name")
统计每个班级的总成绩:
sqldf("select class,sum(score) as score_sum from Mydata group by class")
统计每个班级的每门课程的总成绩和平均分:
sqldf("select class,course,sum(score) as score_sum,avg(score) as score_avg from Mydata group by class,course")
点菜单的编辑,替换,查找处输入 2010,替换处输入 2011,全部替换。 由于担心把不必要的2010替换为2011,可以输入类似 2010.xls 和 2011.xls,如果适用的话。1.创建数据框a <- data.frame("geneid1"=rep("TabHLH1",3),"geneid2"=c("TabHLH2.1","TabHLH2.2","TabHLH2.3"),"geneid3"=rep("TabHLH3",3))
结果如下:
geneid1 geneid2 geneid3
1 TabHLH1 TabHLH2.1 TabHLH3
2 TabHLH1 TabHLH2.2 TabHLH3
3 TabHLH1 TabHLH2.3 TabHLH3
加载函数包
library(dplyr)
library(tidyr)
将第二列以“.”分列
b <- a %>% separate(geneid2, c("gene","id"), "[.]")
结果如下
geneid1 gene id geneid3
1 TabHLH1 TabHLH2 1 TabHLH3
2 TabHLH1 TabHLH2 2 TabHLH3
3 TabHLH1 TabHLH2 3 TabHLH3