R语言中的多元方差分析

Python013

R语言中的多元方差分析,第1张

R语言中的多元方差分析

1、当因变量(结果变量)不止一个时,可用多元方差分析(MANOVA)对它们同时进行分析。

library(MASS)

attach(UScereal)

y <- cbind(calories, fat, sugars)

aggregate(y, by = list(shelf), FUN = mean)

Group.1 calories fatsugars

1 1 119.4774 0.6621338 6.295493

2 2 129.8162 1.3413488 12.507670

3 3 180.1466 1.9449071 10.856821

cov(y)

calories fat sugars

calories 3895.24210 60.674383 180.380317

fat60.67438 2.713399 3.995474

sugars180.38032 3.995474 34.050018

fit <- manova(y ~ shelf)

summary(fit)

Df Pillai approx F num Df den Df Pr(>F)

shelf 1 0.195944.955 3 61 0.00383 **

Residuals 63

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

summary.aov(fit)

Response calories :

Df Sum Sq Mean Sq F valuePr(>F)

shelf1 45313 45313 13.995 0.0003983 ***

Residuals 63 2039823238

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Response fat :

Df Sum Sq Mean Sq F value Pr(>F)

shelf1 18.421 18.4214 7.476 0.008108 **

Residuals 63 155.236 2.4641

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Response sugars :

Df Sum Sq Mean Sq F value Pr(>F)

shelf1 183.34 183.34 5.787 0.01909 *

Residuals 63 1995.87 31.68

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

2、评估假设检验

单因素多元方差分析有两个前提假设,一个是多元正态性,一个是方差—协方差矩阵同质性。

(1)多元正态性

第一个假设即指因变量组合成的向量服从一个多元正态分布。可以用Q-Q图来检验该假设条件。

center <- colMeans(y)

n <- nrow(y)

p <- ncol(y)

cov <- cov(y)

d <- mahalanobis(y, center, cov)

coord <- qqplot(qchisq(ppoints(n), df = p), d, main = "QQ

Plot Assessing Multivariate Normality",

ylab = "Mahalanobis D2")

abline(a = 0, b = 1)

identify(coord$x, coord$y, labels = row.names(UScereal))

如果所有的点都在直线上,则满足多元正太性。

2、方差—协方差矩阵同质性即指各组的协方差矩阵相同,通常可用Box’s M检验来评估该假设

3、检测多元离群点

library(mvoutlier)

outliers <- aq.plot(y)

outliers

1、RMSE(均方根误差)即标准误差

假如数据在A1:Z1

标准方差用函数=STDEV(A1:Z1)

方差用函数=VARA(A1:Z1)

2、MRE(平均相对误差)

Excel/函数/统计/STDEV(Sd)

计算出标准偏差Sd值,然后除以平均数再×100%就可以了。

为了找到均方根误差,我们首先需要找到残差(也称为误差,我们需要对这些值均方根),然后需要计算这些残差的均方根。因此,如果我们有一个线性回归模型对象说M,则均方根误差可以找到为sqrt(mean(M $residuals ^ 2))。

示例

x1<-rnorm(500,50,5)

y1<-rnorm(500,50,2)

M1<-lm(y1~x1)

summary(M1)

输出结果

Call:

lm(formula = y1 ~ x1)

Residuals:

Min 1QMedian3QMax

-5.6621 -1.2257 -0.0272 1.4151 6.6421

Coefficients:

EstimateStd.Errort value Pr(>|t|)

(Intercept) 50.178943 0.915473 54.812 <2e-16 ***

x1 -0.002153 0.018241 -0.118 0.906

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.966 on 498 degrees of freedom

Multiple R-squared: 2.798e-05, Adjusted R-squared: -0.00198

F-statistic: 0.01393 on 1 and 498 DF, p-value: 0.9061

从模型M1中找到均方根误差-

示例

sqrt(mean(M1$residuals^2))

输出结果

[1] 1.961622

示例

x2<-rnorm(5000,125,21)

y2<-rnorm(5000,137,10)

M2<-lm(y2~x2)

summary(M2)

输出结果

Call:

lm(formula = y2 ~ x2)

Residuals:

Min 1QMedian3QMax

-37.425 -7.005 -0.231 6.836 36.627

Coefficients:

Estimate Std.Error t value Pr(>|t|)

(Intercept) 138.683501 0.851247 162.918 <2e-16 ***

x2 -0.014386 0.006735 -2.136 0.0327 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.06 on 4998 degrees of freedom

Multiple R-squared: 0.0009121, Adjusted R-squared: 0.0007122

F-statistic: 4.563 on 1 and 4998 DF, p-value: 0.03272

从模型M2中找到均方根误差:

示例

sqrt(mean(M2$residuals^2))

输出结果

[1] 10.05584

示例

x37<-rpois(500,5)

y3<-rpois(500,10)

M3<-lm(y3~x3)

summary(M3)

输出结果

Call:

lm(formula = y3 ~ x3)

Residuals:

Min 1QMedian3QMax

-7.9004 -1.9928 -0.2155 2.1921 9.3770

Coefficients:

EstimateStd.Error t value Pr(>|t|)

(Intercept) 10.17770 0.3233031.481<2e-16 ***

x3 -0.09244 0.06145-1.5040.133

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.027 on 498 degrees of freedom

Multiple R-squared: 0.004524, Adjusted R-squared: 0.002525

F-statistic: 2.263 on 1 and 498 DF, p-value: 0.1331

从模型M3查找均方根误差-

示例

sqrt(mean(M3$residuals^2))

输出结果

[1] 3.020734

示例

x4<-runif(50000,5,10)

y4<-runif(50000,2,10)

M4<-lm(y4~x4)

summary(M4)

输出结果

Call:

lm(formula = y4 ~ x4)

Residuals:

Min1Q Median 3QMax

-4.0007 -1.9934 -0.0063 1.9956 3.9995

Coefficients:

EstimateStd.Error t value Pr(>|t|)

(Intercept) 5.9994268 0.0546751 109.729 <2e-16 ***

x40.0001572 0.0071579 0.0220.982

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.309 on 49998 degrees of freedom

Multiple R-squared: 9.646e-09, Adjusted R-squared: -1.999e-05

F-statistic: 0.0004823 on 1 and 49998 DF, p-value: 0.9825

从模型M4找到均方根误差-

示例

sqrt(mean(M4$residuals^2))

输出结果

[1] 2.308586

示例

x5<-sample(5001:9999,100000,replace=TRUE)

y5<-sample(1000:9999,100000,replace=TRUE)

M5<-lm(y5~x5)

summary(M5)

输出结果

Call:

lm(formula = y5 ~ x5)

Residuals:

Min 1QMedian 3Q Max

-4495 -2242-42230 4512

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 5.504e+03 4.342e+01 126.765 <2e-16 ***

x5-1.891e-03 5.688e-03 -0.333 0.74

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2594 on 99998 degrees of freedom

Multiple R-squared: 1.106e-06, Adjusted R-squared: -8.895e-06

F-statistic: 0.1106 on 1 and 99998 DF, p-value: 0.7395

从模型M5中找到均方根误差<

示例

sqrt(mean(M5$residuals^2))

输出结果

[1] 2593.709