在RStudio输出图片的方法:
- 很多函数不理解,代码参考生信技能树,把图画完,也是有一丢丢成就感,画总比不画好,以后再慢慢去琢磨和理解。
- 基于上一篇的统计可视化,先运行以下代码:
SraRunTable <- read.table("http://www.bio-info-trainee.com/tmp/5years/SraRunTable.txt",fill=TRUE,header = T,sep = "\t")
sample <-read.csv("sample.csv")
m=merge(SraRunTable,sample,by.x = 'Sample_Name',by.y = 'Accession') - 对前面读取的 RunInfo Table 文件在R里面探索其MBases列,包括 箱线图(boxplot)和五分位数(fivenum),还有频数图(hist),以及密度图(density) 。
- 箱线图用函数boxplot()
boxplot(SraRunTable$MBases, main = "boxplot of MBases")
- 五分位数用函数fivenum()
plot(fivenum(SraRunTable$MBases), main = "fivenum of MBases")
- 频数图用函数plot(hist())
plot(hist(SraRunTable$MBases), main = "hist of MBases")
- 密度图用函数plot(density())
plot(density(SraRunTable$MBases,na.rm=T), main = "density of MBases")
- 把前面读取的样本信息表格的sample名字根据下划线分割看第3列元素的统计情况。第三列代表该样本所在的plate
lapply里的 'l' 代表list,将指定操作应用与列表中的所有元素
字符串的分割函数,指定分割符,生成list
- 把前面读取的样本信息表格的sample名字根据下划线分割看第3列元素的统计情况。第三列代表该样本所在的plate
title = sample$Title
class(title)
#run
> title = sample$Title
> class(title)
[1] "character"
plate = unlist(lapply(title,function(x){ x
strsplit(x,'_')[[1]][3]
}))
table(plate)
#run
> table(plate)
plate
0048 0049
384 384
- 根据plate把关联到的 RunInfo Table 信息的MBases列分组检验是否有统计学显著的差异。
plate 指384孔PCR板,编号分别是48和49号
- 根据plate把关联到的 RunInfo Table 信息的MBases列分组检验是否有统计学显著的差异。
t.test(SraRunTable$MBases~plate)
#run
Welch Two Sample t-test
data: SraRunTable$MBases by plate
t = 2.3019, df = 728.18, p-value = 0.02162
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
0.1574805 1.9831445
sample estimates:
mean in group 0048 mean in group 0049
13.08854 12.01823
- 分组绘制箱线图(boxplot),频数图(hist),以及密度图(density)
- 箱线图
boxplot(m$MBases~plate)
typeof(plate)
#run
> typeof(plate)
[1] "character"
在这里卡了很久,在运行之前喜欢清空,结果运行变量e老报错,先运行前面的变量SraRunTable,sample,m后再运行。
- 频数图
e = m[,c("MBases","Title")]
e$plate = plate
hist(e$MBases,freq = F, breaks = "sturges")
- 密度图
plot(density(e$MBases,na.rm=T))
- 使用ggplot2把上面的图进行重新绘制。
library(ggplot2)
suppressMessages(library(ggplot2)) e$plate = plate
e$num=c(1:768)
colnames(e)
#run
> colnames(e)
[1] "MBases" "Title" "plate" "num"
- ggplot-箱线图
ggplot(e,aes(x=plate,y=MBases)) + geom_boxplot()
#`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
- ggplot-频数图
ggplot(e,aes(x=MBases)) + geom_histogram(fill="lightblue",colour="grey") + facet_grid (plate ~ .)
ggplot(e,aes(x=MBases,fill=plate))+geom_histogram()
ggplot(e,aes(y=MBases,x=num)) + geom_point() + stat_density2d(aes(alpha=..density..),
geom = "raster",contour = F)+ facet_grid(plate ~ .)
- ggplot-密度图
ggplot(e,aes(x=MBases,fill=plate))+geom_density()
- 使用ggpubr把上面的图进行重新绘制。
- 用函数ggboxplot()画箱线图
suppressMessages(library(ggpubr))
ggboxplot(e, x="plate", y="MBases", color = "plate", palette = "aaas",add = "jitter") + stat_compare_means(method = "t.test")
- 用函数gghistogram()画频数图
gghistogram(e, x="MBases", fill = "plate",palette = c("#f4424e", "#41a6f4"))
#Warning message:
Using `bins = 30` by default. Pick better value with the argument `bins`.
- 用函数ggdensity()画密度图
ggdensity(e, x="MBases", fill = "plate", , color = "plate", add = "mean",palette = c(
"#f4424e", "#41a6f4"))
- 随机取384个MBases信息,跟前面的两个plate的信息组合成新的数据框,第一列是分组,第二列是MBases,总共是384*3行数据。
data <- e[sample(nrow(e),384),][,c(3,1,2)]
str(data)
#run
> data <- e[sample(nrow(e),384),][,c(3,1,2)]
> str(data)
'data.frame': 384 obs. of 3 variables:
$ plate : chr "0049" "0048" "0049" "0049" ...
$ MBases: int 3 16 5 2 8 14 25 11 16 7 ...
$ Title : chr "SS2_15_0049_J6" "SS2_15_0048_N2" "SS2_15_0049_M5" "SS2_15_0049_N24" ...
课程分享
生信技能树全球公益巡讲
(https://mp.weixin.qq.com/s/E9ykuIbc-2Ja9HOY0bn_6g)
B站公益74小时生信工程师教学视频合辑
(https://mp.weixin.qq.com/s/IyFK7l_WBAiUgqQi8O7Hxw)