F检验(方差齐性检验): 主要通过比较两组数据的方差,以确定他们的密度是否有显著性差异(判断两总体方差是否相等,就可以用F检验)。
F检验的前提: 数据满足正态分布,使用Shapiro-Will进行正态分布检验
原文链接:https://blog.csdn.net/rojyang/article/details/102900097
# 正态分布检验
> shapiro.test(x)
>
statistic
the value of the Shapiro-Wilk statistic.
p.value
an approximate p-value for the test. This is said in Royston (1995) to be adequate for p.value < 0.1.
method
the character string "Shapiro-Wilk normality test".
data.name
a character string giving the name(s) of the data.
### F检验的实现
var.test(x, ...)
## Default S3 method:
var.test(x, y, ratio = 1,
alternative = c("two.sided", "less", "greater"),
conf.level = 0.95, ...)
## S3 method for class 'formula'
var.test(formula, data, subset, na.action, ...)
教程2:https://www.jianshu.com/p/7e510ac22c64
假设数据服从正态分布,进行F检验其方差是否一致,从两研究总体中随机抽取样本,要对这两个样本进行比较的时候,首先要判断两总体方差是否相同,即方差齐性。若两总体方差相等,则直接用t检验,若不等,可采用t’检验或变量变换或秩和检验等方法。其中要判断两总体方差是否相等,就可以用F检验。简单说,检验两个样本的方差是否具有显著性差异(F检验),这是选择何种独立样本T经验(方差齐时选择一种T检验方法,方差不齐时选择一种T检验方法)的前提条件。
R中常用的三种F检验的方法,bartlett.test方差齐性检验、var.test方差齐性检验、leveneTest方差齐性检验
前两者是对原始数据的方差进行检验的,leveneTest是对方差模型的残差进行组间齐性检验.一般认为是要求残差的方差齐,所以一般的统计软件都做的是leveneTest
—————————#独立样本T检验#——————————————
#有两种情况,一种是两个总体方差齐性,另一种是两个总体方差不齐。
#################两样本方差齐性
#用高蛋白和低蛋白两种饲料饲养1月龄的大白鼠,饲养3个月后,测定两组大白鼠的增重量(g),两组数据分别如下所示:
#高蛋白组:134,146,106,119,124,161,107,83,113,129,97,123
#低蛋白组:70,118,101,85,107,132,94
#试问两种饲料养殖的大白鼠增重量是否有显著差异?
High<-c(134,146,106,119,124,161,107,83,113,129,97,123)
Low<-c(70,118,101,85,107,132,94)
Group<-c(rep(1,12),rep(0,7))#1表示High,0表示Low
x<-c(High,Low)
DATA<-data.frame(x,Group)
DATA$Group<-as.factor(DATA$Group)
#############R中常用的三种F检验的方法,bartlett.test方差齐性检验、var.test方差齐性检验、leveneTest方差齐性检验#############
#bartlett.test方差齐性检验
bartlett.test(x~Group)
Bartlett test of homogeneity of variances
data: x by Group
Bartlett's K-squared = 0.0066764, df = 1, p-value = 0.9349
#var.test方差齐性检验
var.test(x~Group)
F test to compare two variances
data: x by Group
F = 0.94107, num df = 6, denom df = 11, p-value = 0.9917
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.2425021 5.0909424
sample estimates:
ratio of variances
0.941066
#leveneTest方差齐性检验(也是SPSS的默认方差齐性检验方法)
library(car)
leveneTest(DATA$x,DATA$Group)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 0.0088 0.9264
17
#前两者是对原始数据的方差进行检验的,leveneTest是对方差模型的残差进行组间齐性检验.一般认为是要求残差的方差齐,所以一般的统计软件都做的是leveneTest
#结果说明两独立样本数据方差齐性,可以进行独立样本T检验。
####即直接进行T检验。#####
t.test(High,Low,paired=FALSE)
Welch Two Sample t-test
data: High and Low
t = 1.9319, df = 13.016, p-value = 0.07543
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.263671 40.597005
sample estimates:
mean of x mean of y
120.1667 101.0000
结果表明两种饲料养殖的大白鼠增重量无显著差异。
#################两样本方差不齐########################
#有人测定了甲乙两地区某种饲料的含铁量(mg/kg),结果如下:
#甲地:5.9,3.8,6.5,18.3,18.2,16.1,7.6
#乙地:7.5,0.5,1.1,3.2,6.5,4.1,4.7
#试问这种饲料含铁量在两地间是否有显著差异?
JIA<-c(5.9,3.8,6.5,18.3,18.2,16.1,7.6)
YI<-c(7.5,0.5,1.1,3.2,6.5,4.1,4.7)
Content<-c(JIA,YI)
Group<-c(rep(1,7),rep(2,7))#1表示甲地,2表示乙地
data<-data.frame(Content,Group)
data$Group<-as.factor(Group)
###############R中三种F检验方式##############
#bartlett.test方差齐性检验
bartlett.test(Content~Group)
Bartlett test of homogeneity of variances
data: Content by Group
Bartlett's K-squared = 3.9382, df = 1, p-value = 0.0472
#var.test方差齐性检验
var.test(Content~Group)
F test to compare two variances
data: Content by Group
F = 5.9773, num df = 6, denom df = 6, p-value = 0.04695
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
1.02707 34.78643
sample estimates:
ratio of variances
5.9773
#leveneTest方差齐性检验(也是SPSS的默认方差齐性检验方法)
library(car)
leveneTest(data$Content,data$Group)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 1 3.073 0.1051
12
#结果说明两独立样本数据方差不齐。
####当两组数据方差不齐时,可采用t’检验或变量变换或秩和检验等方法。
t.test(Content,Group,paired=FALSE,var.equal=FALSE)##此时设定var.equal=FALSE,表示方差不齐,默认是TRUE,方差齐性。
Welch Two Sample t-test
data: Content and Group
t = 3.7511, df = 13.202, p-value = 0.002362
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
2.519419 9.337724
sample estimates:
mean of x mean of y
7.428571 1.500000
#方差齐性检验表明,方差不齐,因此设定var.equal=FALSE,此时p=0.0023<0.05,
#表明该饲料在两地的含铁量有显著差异。
但是我发现这些函数的输入都是一组数据,那么我的题目只有均值和标准差,该怎么进行F检验呢……
直接使用var.test就可以,生成两组符合正态分布的数据就行,
接着在进行t.test检验
control<-rnorm(20,0.26,0.22)
> case<-rnorm(20,0.21,0.18)
> var.test(control,case)
F test to compare two variances
data: control and case
F = 1.8451, num df = 19, denom df = 19, p-value = 0.191
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.7303264 4.6616400
sample estimates:
ratio of variances
1.845134
> t.test(control,case,paired = FALSE,var.equal=TRUE)
Two Sample t-test
data: control and case
t = -0.61943, df = 38, p-value = 0.5393
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.16295051 0.08659395
sample estimates:
mean of x mean of y
0.2027261 0.2409044
还有一种题型是列联表的,求置信区间
#7.1
setwd('/data1/jiarongf/jupyter_projects/other/data/data_txt')
rm(list=ls())
tab <- as.table(cbind(c(2205,2316,4521), c(21358,16663,38021),c(23563,18979,42542))) #创建列联表
dimnames(tab) <- list(c("Surviving_adults", "Surviving_children","total"),
c("seriously_injured", "other_injured","total"))
tab
tab_Xsqtest <- chisq.test(tab)
tab_Xsqtest