R语言自学笔记-2内置数据集

Python09

R语言自学笔记-2内置数据集,第1张

#b站视频——R语言入门与数据分析

#内置数据集

#固定格式的数据(矩阵、数据框或一个时间序列等)

#统计建模、回归分析等试验需要找合适的数据集

#R内置数据集,存储在,通过

help(package="datasets")

#通过data函数访问这些数据集

data()

#得到新窗口  前面:数据集名字  后面:内容

#包含R所有用到的数据类型,包括:向量、矩阵、列表、因子、数据框以及时间序列等

#直接输入数据集的名字就可以直接使用这些数据集

#输出一个向量

rivers

#是北美141条河流长度

#这些数据集的名字都是内置的,一般我们在给变量命名时最好不要重复

#否则数据集在当前对话中会被置换掉

#例如

rivers<-c(1,2,3)

rivers

#不过影响不大

#再使用data函数重新加载这个数据集就可以了

data("rivers")

rivers

#一些常用内置数据集

#默认介绍页面只有名字和介绍,并没有给出数据分类

#哪些是向量、矩阵、数据框等?

#查看数据集除了直接敲数据集名字显示数据之外

#还可以使用help函数查看每个数据集具体的内容

help("mtcars")

euro

#欧元汇率,长度为11,每个元素都有命名

#输出向量的属性信息

names(euro)

#将5个数据构成一个数据框

向量

state.abb #美国50个州的双字母缩写

state.area #美国50个州的面积

state.name #美国50个州的全称

因子

state.division #美国50个州的分类,9个类别

state.region #美国50个州的地理分类

#

state<-data.frame(state.name,state.abb,state.area,state.division,state.region)

state

state.x77 #美国50个州的八个指标

state.x77

VADeaths #1940年弗吉尼亚州死亡率(每千人)

volcano #某火山区的地理信息(10米×10米的网格)

WorldPhones #8个区域在7个年份的电话总数

iris3 #3种鸢尾花形态数据

#以上矩阵→适合画热图

heatmap(volcano)

#这里只是作为一个演示,还需要对这个图进行一些调整

#更复杂的数据结构

Titanic #泰坦尼克乘员统计,是一个数组

UCBAdmissions #伯克利分校1973年院系、录取和性别的频数

crimtab #3000个男性罪犯左手中指长度和身高关系

HairEyeColor #592人头发颜色、眼睛颜色和性别的频数

occupationalStatus #英国男性父子职业联系

#类矩阵

eurodist #欧洲12个城市的距离矩阵,只有下三角部分

Harman23.cor #305个女孩八个形态指标的相关系数矩阵

Harman74.cor #145个儿童24个心理指标的相关系数矩阵

#R中内置最多的数据集——数据框

cars #1920年代汽车速度对刹车距离的影响

iris #3种鸢尾花形态数据

mtcars #32辆汽车在11个指标上的数据

rock #48块石头的形态数据

sleep #两药物的催眠效果

swiss #瑞士生育率和社会经济指标

trees #树木形态指标

USArrests #美国50个州的四个犯罪率指标

women #15名女性的身高和体重

#列表

state.center #美国50个州中心的经度和纬度

#类数据框

Orange #桔子树生长数据

#时间序列数据,和数据框类似,不同的是具有时间序列的顺序,是数据分析中非常常见的格式

#能反映出变化情况以及变化的趋势等

#因此有很多专门的方法用于时间序列的数据分析

co2 #1959-1997年每月大气co2浓度(ppm)

presidents #1945-1974年每季度美国总统支持率

uspop #1790–1970美国每十年一次的人口总数(百万为单位)

#除了内置数据集之外,许多R扩展包中也内置了很多数据集

#这些数据集作为扩展包的函数使用的案例

#加载R包之后这些数据集也同样被加载进来

#例如MASS包中的Cars93数据

#包含了27个变量,是1993年93辆汽车的型号指标

install.packages("MASS")

library("MASS")

help("Cars93")

#使用data函数在参数package中等于对应R包的名字,即可列出每个R包中包含的数据集

#ex

data(package="MASS")

#显示R中所有可用的数据集

data(package=.packages(all.available = TRUE))

#不加载R包使用其中的数据集

data(Chile,package="car")

Chile

#>data(Chile,package="car")

#Warning message:

#  In data(Chile, package = "car") : data set ‘Chile’ not found

#>Chile

#Error: object 'Chile' not found

install.packages("car")

library("car")

help("Chile")

可以使用数据标号“text()”函数text()函数跟在画图函数语句后面,即先画出图,再标号。

下面为来自R的text()函数使用方法(疑难词汇已经标出)

Description

text draws the strings given in the vector(矢量) labels at the coordinates(坐标) given by x and y. y may be missing since xy.coords(x, y) is used for construction of the coordinates.

Usage

text(x, ...)

## Default S3 method:

text(x, y = NULL, labels = seq_along(x$x), adj = NULL,pos = NULL, offset = 0.5, vfont =NULL,cex = 1, col = NULL, font = NULL, ...)

Arguments

x, y

numeric(数) vectors(矢量) of coordinates(坐标) where the text labels should be written. If the length of x and y differs, the shorter one is recycled.

labels

a character vector or expression specifying the text to be written. An attempt is made to coerce(强制) other language objects (names and calls) to expressions, and vectors and other classed objects to character vectors byas.character. If labels is longer than x and y, the coordinates(坐标) are recycled to the length of labels.

adj

one or two values in [0, 1] which specify(指定) the x (and optionally(可选择的) y) adjustment(调整) of the labels(标签). On most devices(装置) values outside that interval will also work.

pos

a position specifier for the text. If specified this overrides(代理佣金) any adj value given. Values of 1, 2, 3 and 4, respectively(分别地) indicate(表明) positions below, to the left of, above and to the right of the specified coordinates.

offset

when pos is specified(指定), this value gives the offset(抵消) of the label(标签) from the specified coordinate(坐标) in fractions(分数) of a character width.

vfont

NULL for the current font family, or a character vector(矢量) of length 2 for Hershey vector fonts. The first element(元素) of the vector selects a typeface and the second element selects a style. Ignored(驳回诉讼) if labels is an expression.

cex

numeric character expansion factor(因素)multiplied by par("cex") yields(产量) the final character size. NULL and NA are equivalent to 1.0.

col, font

the color and (if vfont = NULL) font to be used, possibly vectors(矢量). These default to the values of the global graphical parameters in par().

...

further graphical parameters (from par), such as srt, family and xpd.

Details

labels must be of type character or expression (or be coercible(可强迫的) to such a type). In the latter case, quite a bit of mathematical(数学的) notation(符号) is available such as sub- and superscripts(上标), greek letters,fractions(分数), etc.

adj allows adjustment of the text with respect to (x, y). Values of 0, 0.5, and 1 specify(指定) left/bottom, middle and right/top alignment(队列), respectively(分别地). The default is for centered text, i.e., adj = c(0.5, NA).Accurate(精确的) vertical(垂直的) centering needs character metric(度量标准) information on individual(个人的) characters which is only available on some devices(装置). Vertical alignment is done slightly differently for character strings and for expressions: adj = c(0,0) means to left-justify and to align(结盟) on the baseline for strings but on the bottom of the bounding box for expressions. This also affects vertical(垂直的) centering: for strings the centeringexcludes(排除) any descenders(下降) whereas(然而) for expressions it includes them. Using NA for strings centers them, including descenders.

The pos and offset arguments can be used in conjunction(结合) with values returned by identify to recreate(再创造) an interactively(交互式地) labelled(贴上标签的) plot(情节).

Text can be rotated(旋转的) by using graphical parameters srt (see par)this rotates about the centre set by adj.

Graphical parameters col, cex and font can be vectors(矢量) and will then be applied cyclically(周期的) to the labels (and extra values will be ignored(驳回诉讼)). NA values of font are replaced by par("font"), and similarly for col.

Labels whose x, y or labels value is NA are omitted(省略) from the plot(情节).

What happens when font = 5 (the symbol(象征) font) is selected can be both device- and locale-dependent. Most often labels will be interpreted(说明) in the Adobe symbol encoding, so e.g. "d" is delta, and "\300" is aleph.

Euro symbol

The Euro symbol may not be available in older fonts. In current versions of Adobe symbol fonts it is character 160, so text(x, y, "\xA0", font = 5) may work. People using Western European locales(场所) on Unix-alikes can probably select ISO-8895-15 (Latin-9) which has the Euro as character 165: this can also be used for postscript and pdf. It is \u20ac in Unicode, which can be used in UTF-8 locales(场所).

In all the European Windows encodings the Euro is symbol(象征) 128 and \u20ac will work in all locales: however not all fonts will include it. It is not in the symbol font used for windows and related devices(装置), including the Windows printer.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth &Brooks/Cole.

Murrell, P. (2005) R Graphics. Chapman(叫卖小贩) &Hall/CRC Press.

See Also

text.formula for the formula(公式) methodmtext, title, Hershey for details on Hershey vector(矢量) fonts, plotmath for details and more examples on mathematical(数学的) annotation(注释).

Examples

plot(-1:1, -1:1, type = "n", xlab = "Re", ylab = "Im")

K <- 16text(exp(1i * 2 * pi * (1:K) / K), col = 2)

## The following two examples use latin1 characters: these may not

## appear correctly (or be omitted entirely).

plot(1:10, 1:10, main = "text(...) examples\n~~~~~~~~~~~~~~",

sub = "R is GNU ©, but not ® ...")

mtext("«Latin-1 accented chars»: éè øØ å<Å æ<Æ", side = 3)

points(c(6,2), c(2,1), pch = 3, cex = 4, col = "red")

text(6, 2, "the text is CENTERED around (x,y) = (6,2) by default",

cex = .8)

text(2, 1, "or Left/Bottom - JUSTIFIED at (2,1) by 'adj = c(0,0)'",

adj = c(0,0))

text(4, 9, expression(hat(beta) == (X^t * X)^{-1} * X^t * y))

text(4, 8.4, "expression(hat(beta) == (X^t * X)^{-1} * X^t * y)",

cex = .75)

text(4, 7, expression(bar(x) == sum(frac(x[i], n), i==1, n)))

## Two more latin1 examples

text(5, 10.2,

"Le français, c'est façile: Règles, Liberté, Egalité, Fraternité...")

text(5, 9.8,

"Jetz no chli züritüütsch: (noch ein bißchen Zürcher deutsch)")

1、创建数据集

hospital <- c("New York", "California")patients <- c(150,350)

costs <- c(3.1,2.5)

df <- data.frame(hospital, patients, costs)

2、创建新的变量

df$totcosts <- df$patients * df$costs

3、改变变量的名称

df$costs_euro <- df$costs

df$costs <- NULL

df$patients <- ifelse(df$patients==150,100,ifelse(df$patients==350,300,NA))

4、合并数据集

finaldt <- merge(dataset1, dataset2, by="id")

或者

finaldt <- cbind(dataset1, dataset2)

finaldt <- rbind(dataset1, dataset2)

5、子数据集

dt <- iris[,c("Sepal.Length" ,"Sepal.Width")]

dt <- iris[,c(-2,-3)]#去除第2、3列变量数据

dt2<- subset(dt,Age>40&Sex==men)#对数据集dt筛选满足条件的数据