我是个R新手,只会简单的画一些图,还不能随心所欲的画图。Hadley Wickham男神的ggplot2包以及tidyverse超级大包真是让人膜拜!
最近实验室留学生让我帮他做个火山图,火山图就是点图,用ggplot2是可以完美作图的,但我还不能随心所欲的用ggplot2去作图。在Google上搜索了火山图的方法和教程,结合留学生的数据尝试了一下。另外在Biostar上发现了有人推荐一个EnhancedVolcano包来做火山图。
1. 数据的模样和导入
# 数据导入
>library("ggplot2", lib.loc="C:/Program Files/R/R-3.5.1/library")
>library("ggrepel", lib.loc="C:/Program Files/R/R-3.5.1/library")
> hs_data <- read.delim("clipboard")
> head(hs_data)
ID Name_des FC log2FC Pvalue
1 Com_2035 D-benzylpenicilloic acid 0.000381132 -11.357423 4.07555e-04
2 Com_2160 rosuvastatin 0.000768929 -10.344863 1.84000e-05
3 Com_2025 ARAMITE 0.000943335 -10.049943 2.39000e-11
4 Com_1907 Glycolic acid pentaethoxylate 4-tert-butylphenyl ether 0.001348441 -9.534492 1.98087e-04
5 Com_53 2-Pyrroloylglycine 0.001852784 -9.076090 2.05000e-07
6 Com_147 4-PHOSPHOPANTOTHENOYLCYSTEINE 0.002831398 -8.464270 9.25000e-08
2. 试探性画图
> ggplot(data = hs_data, aes(x = log2FC, y = -log10(Pvalue))) + geom_point()
这图不是想要的,需要将有统计学差异的数据凸显出来,一般设置p值小于0.05,log2FC绝对值大于1.5。
3. 精雕细琢
设置不同的颜色域并作图
> hs_data$threshold = as.factor(ifelse(hs_data$Pvalue < 0.05 & abs(hs_data$log2FC) >= 1.5, ifelse(hs_data$log2FC> 1.5 ,'Up','Down'),'NoSignifi'))
> ggplot(data = hs_data, aes(x = log2FC, y = -log10(Pvalue), colour=threshold,label = ID)) +
geom_point(alpha=0.4, size=3.5) +
scale_color_manual(values=c("blue", "grey","red"))
使用geom_vline,geom_hline添加阈值线,使用xlim调整x轴刻度。
> ggplot(data = hs_data, aes(x = log2FC, y = -log10(Pvalue), colour=threshold,label = ID)) +
geom_point(alpha=0.4, size=3.5) +
scale_color_manual(values=c("blue", "grey","red")) +
xlim(c(-11.5, 11.5)) +
geom_vline(xintercept=c(-1.5,1.5),lty=4,col="black",lwd=0.8) +
geom_hline(yintercept = -log10(0.05),lty=4,col="black",lwd=0.8)
使用labs()和theme()修改x y轴的名字,添加标题和修改图例。
> ggplot(data = hs_data, aes(x = log2FC, y = -log10(Pvalue), colour=threshold,label = ID)) +
geom_point(alpha=0.4, size=3.5) +
scale_color_manual(values=c("blue", "grey","red")) +
xlim(c(-11.5, 11.5)) +
geom_vline(xintercept=c(-1.5,1.5),lty=4,col="black",lwd=0.8) +
geom_hline(yintercept = -log10(0.05),lty=4,col="black",lwd=0.8) +
labs(x="log2(fold change)",y="-log10 (p-value)",title="Differential metabolites") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right", legend.title = element_blank())
p值小于0.05,log2FC绝对值大于1.5这一阈值条件过于宽松,你会发现符合条件的点非常多,因此提高阈值到p值小于0.001 & log2FC绝对值大于3,从而缩小候选基因或代谢物的范围。
此外可以使用ggrepel包给满足特定条件的点加标签,从 图上告诉你这些点究竟是什么基因或者代谢物。
> hs_data$threshold = as.factor(ifelse(hs_data$Pvalue < 0.001 & abs(hs_data$log2FC) >= 3, ifelse(hs_data$log2FC> 3 ,'Up','Down'),'NoSignifi'))
> ggplot(data = hs_data, aes(x = log2FC, y = -log10(Pvalue), colour=threshold,label = ID)) +
geom_point(alpha=0.4, size=3.5) +
scale_color_manual(values=c("blue", "grey","red")) +
xlim(c(-11.5, 11.5)) +
geom_vline(xintercept=c(-3,3),lty=4,col="black",lwd=0.8) +
geom_hline(yintercept = -log10(0.001),lty=4,col="black",lwd=0.8) +
labs(x="log2(fold change)",y="-log10 (p-value)",title="Differential metabolites") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right", legend.title = element_blank())
4. 给钟意的点加标签
ggrepel使得给点加标签简单又美观。看看p< 0.000001且log2FC绝对值大于3的是这些点究竟是什么基因或者代谢物。
> ggplot(data = hs_data, aes(x = log2FC, y = -log10(Pvalue), colour=threshold,label = ID)) +
geom_point(alpha=0.4, size=3.5) +
scale_color_manual(values=c("blue", "grey","red")) +
xlim(c(-11.5, 11.5)) +
geom_vline(xintercept=c(-3,3),lty=4,col="black",lwd=0.8) +
geom_hline(yintercept = -log10(0.001),lty=4,col="black",lwd=0.8) +
labs(x="log2(fold change)",y="-log10 (p-value)",title="Differential metabolites") +
theme(plot.title = element_text(hjust = 0.5), legend.position="right", legend.title = element_blank()) +
geom_text_repel(
data = subset(hs_data, hs_data$Pvalue < 0.000001 & abs(hs_data$log2FC) >= 3),
aes(label = ID),
size = 3,
box.padding = unit(0.5, "lines"),
point.padding = unit(0.8, "lines"), segment.color = "black", show.legend = FALSE )
更灵活的加标签我还未学会,比如指定某一个或某几个点加标签,将标签水平或者垂直排列等等操作。欢迎大家交流R语言可视化的问题。