1、首先,打开RStudio并创建一个新的文件脚本,[File]—[NewScript]。
2、这样就会发现前面代码在全局环境下留下的数据集是非常麻烦的。清洗方法如下:
3、首先,写入rm(A)以清除对应对象的数据(rm=remove)。
4、通过比较,可以发现前一个对象已经被清除。
R语言学习之数据的清理和转化处理字符串
grep grepl 和regexpr函数都能找到与模式相匹配的字符串 sub 和 gsub函数能替换匹配的字符串
加载strngr包,fixed里面为要匹配的字符串 返回匹配的字符串序列
[plain] view plain copy
>library(stringr)
>multiple <- str_detect(english_monarchs$domain,fixed(","))
>english_monarchs[multiple,c("name","domain")]
namedomain
17 Offa East Anglia, Mercia
18 Offa East Anglia, Kent, Mercia
19 Offa and Ecgfrith East Anglia, Kent, Mercia
20 Ecgfrith East Anglia, Kent, Mercia
22C<U+009C>nwulf East Anglia, Kent, Mercia
23 C<U+009C>nwulf and Cynehelm East Anglia, Kent, Mercia
24C<U+009C>nwulf East Anglia, Kent, Mercia
25 Ceolwulf East Anglia, Kent, Mercia
26 Beornwulf East Anglia, Mercia
82 Ecgbehrt and <U+00C6>thelwulf Kent, Wessex
83 Ecgbehrt and <U+00C6>thelwulf Kent, Mercia, Wessex
84 Ecgbehrt and <U+00C6>thelwulf Kent, Wessex
85<U+00C6>thelwulf and <U+00C6>eelstan I Kent, Wessex
86 <U+00C6>thelwulf Kent, Wessex
87 <U+00C6>thelwulf and <U+00C6>eelberht III Kent, Wessex
88 <U+00C6>eelberht III Kent, Wessex
89 <U+00C6>thelred I Kent, Wessex
95 Oswiu Mercia, Northumbria
使用正则表达式来匹配多个要匹配的字符串,这是来匹配逗号和and
[plain] view plain copy
>ruler <- str_detect(english_monarchs$name,",|and")
>english_monarchs[ruler &!is.na(ruler)]
把name一列拆分掉,则可以使用str_splist函数
[plain] view plain copy
>indival <- str_split(english_monarchs$name,",|and")
>head(indival[sapply(indival,length)>1])
[[1]]
[1] "Sigeberht " " Ecgric"
[[2]]
[1] "Hun" " Beonna " " Alberht"
[[3]]
[1] "Offa " " Ecgfrith"
[[4]]
[1] "Cu009cnwulf " " Cynehelm"
[[5]]
[1] "Sighere " " Sebbi"
[[6]]
[1] "Sigeheard " " Swaefred"
st_count是用来统计有多少个字符串
[plain] view plain copy
>str_count(english_monarchs$name,th)
str_replace函数来代替字符串中的某一个
ignore.case来忽略某一个字符或字符串