R语言怎么检验分布是不是T分布

Python014

R语言怎么检验分布是不是T分布,第1张

ks.test()实现了KS检验,可以检验任意样本是不是来自给定的连续分布。

你这里的用法就是:

ks.test(data,pt,df=df) #data是样本的数据,df是要检验的t分布的自由度

我们可以用很多方法分析一个单变量数据集的分布。最简单的办法就是直接看数

字。利用函数summary 和fivenum 会得到两个稍稍有点差异的汇总信息。此外,stem

(\茎叶"图)也会反映整个数据集的数字信息。

>attach(faithful)

>summary(eruptions)

Min. 1st Qu. Median Mean 3rd Qu. Max.

1.600 2.163 4.000 3.488 4.454 5.100

>fivenum(eruptions)

[1] 1.6000 2.1585 4.0000 4.4585 5.1000

>stem(eruptions)

The decimal point is 1 digit(s) to the left of the |

16 | 070355555588

18 | 000022233333335577777777888822335777888

20 | 00002223378800035778

22 | 0002335578023578

24 | 00228

26 | 23

28 | 080

30 | 7

32 | 2337

34 | 250077

36 | 0000823577

38 | 2333335582225577

40 | 0000003357788888002233555577778

42 | 03335555778800233333555577778

44 | 02222335557780000000023333357778888

46 | 0000233357700000023578

48 | 00000022335800333

50 | 0370

茎叶图和柱状图相似,R 用函数hist 绘制柱状图。

>hist(eruptions)

>## 让箱距缩小,绘制密度图

>hist(eruptions, seq(1.6, 5.2, 0.2), prob=TRUE)

>lines(density(eruptions, bw=0.1))

>rug(eruptions) # 显示实际的数据点

更为精致的密度图是用函数density 绘制的。在这个例子中,我们加了一条

由density 产生的曲线。你可以用试错法(trial-and-error)选择带宽bw(bandwidth)

因为默认的带宽值让密度曲线过于平滑(这样做常常会让你得到非常有\意思"的密度

分布)。(现在已经有一些自动的带宽挑选方法2,在这个例子中bw = "SJ"给出的结

果不错。)

我们可以用函数ecdf 绘制一个数据集的经验累积分布(empirical cumulative

distribution)函数。

>plot(ecdf(eruptions), do.points=FALSE, verticals=TRUE)

显然,这个分布和其他标准分布差异很大。那么右边的情况怎么样呢,就是火山

爆发3分钟后的状况?我们可以拟合一个正态分布,并且重叠前面得到的经验累积密

度分布。

>long <- eruptions[eruptions >3]

>plot(ecdf(long), do.points=FALSE, verticals=TRUE)

>x <- seq(3, 5.4, 0.01)

>lines(x, pnorm(x, mean=mean(long), sd=sqrt(var(long))), lty=3)

分位比较图(Quantile-quantile (Q-Q) plot)便于我们更细致地研究二者的吻合

程度。

par(pty="s") # 设置一个方形的图形区域

qqnorm(long)qqline(long)

上述命令得到的QQ图表明二者还是比较吻合的,但右侧尾部偏离期望的正态分布。

我们可以用t 分布获得一些模拟数据以重复上面的过程

x <- rt(250, df = 5)

qqnorm(x)qqline(x)

这里得到的QQ图常常会出现偏离正态期望的长尾区域(如果是随机样本)。我们可以用

下面的命令针对特定的分布绘制Q-Q图

qqplot(qt(ppoints(250), df = 5), x, xlab = "Q-Q plot for t dsn")

qqline(x)

最后,我们可能需要一个比较正规的正态性检验方法。R提供了Shapiro-Wilk 检

>shapiro.test(long)

Shapiro-Wilk normality test

data: long

W = 0.9793, p-value = 0.01052

和Kolmogorov-Smirnov 检验

>ks.test(long, "pnorm", mean = mean(long), sd = sqrt(var(long)))

One-sample Kolmogorov-Smirnov test

data: long

D = 0.0661, p-value = 0.4284

alternative hypothesis: two.sided

(注意一般的统计分布理论(distribution theory)在这里可能无效,因为我们用同样

的样本对正态分布的参数进行估计的。)

转载于:

http://www.biostatistic.net/thread-2413-1-1.html

好文章的一个要素是作者提供了足够的细节,来支持每段的主题句。展开段落,就是以一组这样的句子来阐发该段的中心意思。下面这些方法会帮助你充分展开段落。这些方法中的记叙、描写、事例、和过程等也是常常用来为篇题句服务的。A.记叙记叙是按时间顺序展开的,可以是文章的一段,也可以是几段,甚至整篇文章。记叙的主要目的是告诉读者一个故事,这个故事跟这篇文章有一定的联系。简短的故事常常被用做例子。以记叙一展开段落时,要让读者对你记叙的事件以及事件中的人物感兴趣。比如:It was the end of my exhausting first day as a waitress in busy New York restaurant. My cap had gone awry, my apron was stained, my feet ached. The loaded trays I carried felt heavier and heavier. Weary and discouraged, I didn’t seem able to do anything right. As I made out a complicated check for a family with several children who had changed their ice-cream order a dozen times, I was ready to quit.Then the father smiled at me as he handed me my tip. “Well done,” he said. “You’ve looked after us really well.”Suddenly my tiredness vanished. I smiled back, and later, when the manager asked me how I’d liked my first day, I said, “Fine!” Those few words of praise had changed everything.上面这段记叙,描述了自己做女招待的第一天快结束时如何筋疲力尽,如何受到了顾客的赞许,而放弃了辞职的念头,疲劳也随之消散的情况。B.描写作者写作的时候,经常要描写他们所看到的东西。准确、成功的描述逼真地让我们看到、听到、闻到、尝到、触摸到所描述的东西。下面是一段描写Brooklyn Bridge 上的promenade的文字,作者的描写很老道,即使我们没到过那桥,我们也能凭借这段文字,想像那人行道是什么样子。On most traffic bridges the only foot passage is a pavement alongside rushing vehicles. Here the walk is a promenade raised above the trafficone can lean over the railing and watch cars speeding below. The promenade is wide enough for benches – for walkers and for cyclists. Old-fashioned lamp posts remind the walker that it is, after all, a thoroughfare. But the walk is narrow enough for the promenader to reach over and touch the large round cables, wrapped in wire casing, or the rough wire rope of the vertical suspenders. Crossing the verticals is a rigging of diagonal wire ropes – stays, attached somewhere below to the floor of roadway.C.事例用事例来阐述主题句是一个常用的方法。生动的事例会让读者更好地记住主题句的内容。下面这一段是作者告诉读者控制昆虫的方法是如何影响鱼类的:Wherever there are great forests, modern methods of insect control threaten the fishes inhabiting the streams in the shelter of the trees. One of the best-known examples of destruction in the United States took place in 1955, as a result of spraying in and near Yellowstone National Park. By the fall of that year so many dead fish had been found in the Yellowstone River that sportsmen and Montana fish and game administrators became alarmed. About 90 miles of the river were affected. In one 300-yard length of shoreline, 600 dead fish were counted, including brown trout, whitefish, and suckers. Stream insects, the natural food of trout, had disappeared.D.过程如果你曾经按照菜谱做过一道菜,你就会知道以过程展开段落是什么情形。以过程展开段落,首先要界定清楚做某件事情的第一个的步骤,然后借助诸如fist, second, third, …next, then, finally等过渡语来详细地描述这些步骤。看下面的例子:Once you encounter a person who has stopped breathing, you should begin immediately to do mouth-to-mouth breathing. First, place the victim on his back and remove any foreign matter from his mouth with your fingers. Then tilt his head backwards, so that his chin is pointing up. Next, pull his mouth open and his jaw forward, pinch his nostrils shut to prevent the air which you blow into his mouth form escaping through his nose. Then place your mouth tightly over the victim’s. blow into his mouth until you see his chest rise. Then turn your head to the side and listen for the out-rush of air which indicates an air exchange. Repeat the process. …E. 定义定义也是展开段落的一个手段。在信息性(informative writing)的文章中,读者可以通过对术语的定义来理解某些概念;在劝说性(persuasive writing)的文章中,作者可以通过对术语的定义来与读者建立起共通的语言,这是说服读者的第一步。术语一旦定义后,你可以通过例子、比较、类比、描写等手段,使其更清晰化。看下面的例子:What exactly is genetic engineering? Genetic engineering is the name given to techniques by which scientists alter or combine genes (hereditary material) in an organism. Genes, which are part of all living cells, carry chemical information that determines an organism’s characteristicsgenes have often been called “blueprint” of life. By changing an organism’s genes, scientists can manipulate the organism’s traits and the traits of its descendants. Ultimately, through manipulation of genetic material, scientists hope to produce flawless organisms – microorganisms, plants, and animals that exhibit greater productivity, lower cost, and more resistance to illness.F.分类分类是把相同的东西归成一类。开始的时候,你收集的资料可能是不相关的,经过仔细观察,慢慢地,你发现其中某些东西之间的相似之处出现了,然后,你就可以按照一个个的标准把这些东西归成不同的类别。同一组东西,依照不同的标准,可以归成不同的类别,所以,归类的时候,要告诉读者你的标准。下面的例子中,作者在第一句,即主题句中,将纽约分成三种情况,然后依次对这三种情况说明了其分类的标准。There are roughly three New Yorks. There is, first, the New York of the man or woman who was born here, who takes the city for granted and accepts its size and its turbulence as natural and inevitable. Second, there is the New York of the commuter – the city that is devoured by locusts each day and spat out each night. Third, there is a New York in quest of something. Of these three trembling cities the greatest is the last – the city of final destination, the city that is a goal. It is this third city that accounts for New York’s high–strung disposition, its poetical deportment, its dedication to the arts, and its incomparable achievements. Commuters give the city its tidal restlessness, natives give it solidity and continuity, but the settlers give it passion.G.比较或对比比较(comparison) 探讨人物、地点、物体、事件或观点间的相似之处,而对比(contrast)探讨的是它们或他们之间不同的地方。比较或对比的时候,要依照一定的标准要点。比如,要比较、对比电脑和人脑,你可以围绕这两要点进行:一是信息储存的密度,二是信息加工的速度。另外,利用比较、对比来展开段落,不只是简单地罗列相同与不同之处,要有一定的目的。或者分类,或者评价,或者解释。写作中,比较、对比的展开方式有两种:一种是“交替法”(alternating method),即比较或对比两种事物的共性或差异中的一点然后再比较或对比下一点,依次类推。另一种是“块状法”(block method),即先提供有关某一事情的所有信息,然后再提供有关另一事情的所有信息。下面这段话是按“交替法”展开的,比较了电脑和人脑的相同点,对比了两者间的不同点。首先,作者围绕信息储存的密度这个要点平考察这两个物体。接下来,他以信息加工的速度这个要点一比较或对比电脑和人脑。How densely packed is the information stored in the brain? A typical information density during the operation of a modern computer is about a million bits per cubic centimeter. This is the total information content of the computer, divided by its volume. The human brain contains, as we have said, about 1013 bits in a little more than 103 cubic centimeters, for an information content of 1013/103=1010, about ten billion bits per cubic centimeterthe brain is therefore ten thousand times more densely packed with information than is a computer, although the computer is much larger. Put another way, a modern computer able to process the information in the human brain would have to be about ten thousand times larger in volume than the human brain. On the other hand, modern electric computers are capable of processing information at a rate of 1016 to 1017 bits per second, compared to a peak rate ten billion times slower in the brain. The brain must be extraordinarily cleverly packaged and “wired,” with such a small total information content and so slow a processing rate, to be able to do so many significant tasks so much better than the best computer.H.类比有时候,作者为了说明一个抽象的问题,或者劝说读者,从一个看起来不那么相关的话题出发,通过比较,找出两者的共同点,从而得出“A是这样,B像A,所以B也是这样”的结论。这种方法就是类比(analogy)。这种方法不仅使读者看到扑克似风马牛不相及的两件事中的相似性,而且帮助读者理解抽象的概念、深奥的理论等。在下面这个例子中,作者把the process of learning 比作the performance of a symphony orchestra, 过渡语with an analogy to 表明作者准备类比;描述了the performance of a symphony orchestra后,作者以similarly 提醒读者;接下来的是类比,从而清楚地向读者解释了抽象的the process of learning 是什么样子。In closing, we might describe learning with an analogy to a well-orchestrated symphony, aimed to blend both familiar and new sounds. A symphony is the complex interplay of composer, conductor, the repertoire of instruments, and the various dimensions of instruments toward a rich construction of themes progressing in phase, with some themes recurring and others driving the movement forward toward a conclusion or resolution. Finally, each symphony stands alone in its meaning, yet has a relationship to the symphonies that came before and those that will come later. Similarly, learning is a complex interaction of the learner, the instructional materials, the repertoire of available learning strategies, and the context, including the teacher. The skilled learner approaches each task strategically toward the goal of constructing meaning. Some strategies focus on understanding the incoming information, others strive to relate the meaning to earlier predictions, and still others work to integrate the new information with prior knowledge. I.因果以因果方式展开段落,能告诉读者究竟是什么原因导致了一件事的发生。我们知道,事件之间的因果关系通常是很复杂的,所以写作之前,要对其进行仔细地分析,避免过于简单化,尤其要避免给读者这样一种印象:因为一件事发生在另一件事之前,前一件事导致了后一件事的发生。下面这个例子中,作者就指出了经济萧条的根本原因是an unhealthy economy, 而非the stock market crash in October 1929.The depression was precipitated by the stock market crash in October 1929, but the actual cause of the collapse was an unhealthy economy. While the ability of manufacturing industry to produce consumer goods had increased rapidly, mass purchasing power had remained relatively static. Most laborers, farmers, and white-collar workers, therefore, could not afford to buy automobiles and refrigerators turned out by factories in the 1920s, because their incomes were too low. At the same time, the federal government increased the problem through economic policies that tended to encourage the very rich to over-save.上面我们探讨了常见的几种发展段落的方法,有时候,段落中仅使用其中的一种,但更多的时候,同一个段落中会几种手段并用,下面这段话就是一个例子。作者主要是以给出细节来展开该段的;第一句是主题句,第二、三、四句给出细节;同时,作者又以分类来展开该段落:该宗教运动的成员分为正在上大学的、已经退学的、男的、女的、城市人、乡下人。