说在前面
相信大家在平时做富集分析时都会有这样的一个需求:如果能知道感兴趣的某条通路中各基因的调控关系,那么就能准确识别出hub基因;或者说找到我们感兴趣的基因在这条通路中的上下游调控关系,从而就可以进行后续的实验验证。很多情况下只有想象中是完美的,但是只要感想就会有实现的机会,对于上面说的这个想法就在今年被实现了。
想必国内的生信小伙伴都或多或少的听闻过Y叔的大名,Y叔开发的一系列生信分析软件可谓撑起了国内生信圈的半边天。而今天Immugent介绍的这个软件也是最近由Y叔和京都大学的Yasushi Okuno一同开发的CBNplot,相应的文章发表在Bioinformatics杂志上,篇名为 CBNplot: Bayesian network plots for enrichment analysis。
关于CBNplot的介绍,生信宝库会以三篇推文并且以代码实操的形式分别介绍其主要功能,下面开始介绍第一部分的用法。
代码展示
首先我们先从GEO上下载一个示例数据,算出差异基因后再做富集分析。
library(DESeq2)
## Load dataset and make metadata
counts = read.table("GSE133624_reads-count-all-sample.txt", header=1, row.names=1)
meta = sapply(colnames(counts), function (x) substring(x,1,1))
meta = data.frame(meta)
colnames(meta) = c("Condition")
dds <- DESeqDataSetFromMatrix(countData = counts,
colData = meta,
design= ~ Condition)
## Prefiltering
filt <- rowSums(counts(dds) < 10) > dim(meta)[1]*0.9
dds <- dds[!filt,]
## Perform DESeq2()
dds = DESeq(dds)
res = results(dds, pAdjustMethod = "bonferroni")
## apply variance stabilizing transformation
v = vst(dds, blind=FALSE)
vsted = assay(v)
## Plot PCA of VST values
DESeq2::plotPCA(v, intgroup="Condition")+
theme_bw()
## Define the input genes, and use clusterProfiler::bitr to convert the ID.
sig = subset(res, padj<0.05)
cand.entrez = clusterProfiler::bitr(rownames(sig), fromType="ENSEMBL", toType="ENTREZID", OrgDb=org.Hs.eg.db)$ENTREZID
## Perform enrichment analysis (ORA)
pway = ReactomePA::enrichPathway(gene = cand.entrez)
pwayGO = clusterProfiler::enrichGO(cand.entrez, ont = "BP", OrgDb = org.Hs.eg.db)
## Convert to SYMBOL
pway = setReadable(pway, OrgDb=org.Hs.eg.db)
pwayGO = setReadable(pwayGO, OrgDb=org.Hs.eg.db)
## Store the similarity
pway = enrichplot::pairwise_termsim(pway)
## Define including samples
incSample = rownames(subset(meta, Condition=="T"))
allEntrez = clusterProfiler::bitr(rownames(res), fromType="ENSEMBL", toType="ENTREZID", OrgDb=org.Hs.eg.db)
res$ENSEMBL <- rownames(res)
lfc <- merge(data.frame(res), allEntrez, by="ENSEMBL")
lfc <- lfc[order(lfc$log2FoldChange, decreasing=TRUE),]
geneList <- lfc$log2FoldChange
names(geneList) <- lfc$ENTREZID
pwayGSE <- ReactomePA::gsePathway(geneList)
sigpway <- subset(pway@result, p.adjust<0.05)
paste(mean(sigpway$Count), sd(sigpway$Count))
基于富集分析的结果我们就可以使用CBNplot对我们感兴趣的通路进行展示了。
barplot(pway, showCategory = 15)
#使用bngeneplot函数绘图
bngeneplot(results = pway, exp = vsted, pathNum = 17)
#Change the label for the better readability.
bngeneplot(results = pway, exp = vsted, pathNum = 17, labelSize=7, shadowText=TRUE)
# Show the confidence of direction
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 13, R = 50, showDir = T,
convertSymbol = T,
expRow = "ENSEMBL",
strThresh = 0.7)
可以通过参数compareRef=TRUE并指定pathDb,可以将基因之间的关系与参考网络进行比较。默认情况下,两个有向网络的交集以重叠边的数量表示。
library(parallel)
cl = makeCluster(4)
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 13, R = 30, compareRef = T,
convertSymbol = T, pathDb = "reactome",
expRow = "ENSEMBL", cl = cl)
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 15, R = 10, compareRef = T,
convertSymbol = T, pathDb = "reactome", compareRefType = "difference",
expRow = "ENSEMBL")
还可以添加一个barplot来描述边缘的强度和方向(概率),指定strength plot =TRUE和nStrength。
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = 15, R = 10, compareRef = T,
convertSymbol = T, pathDb = "reactome", compareRefType = "intersection",
expRow = "ENSEMBL", sizeDep = T, dep = dep, strengthPlot = T, nStrength = 10)
cl = makeCluster(8)
bngeneplot(results = pway,
exp = vsted,
expSample = incSample,
pathNum = c(15, 16), R = 10,
convertSymbol = T,
expRow = "ENSEMBL")
展望
在本期推文中,小编从GEO数据库上下载了示例数据病,并后续进行了差异分析和富集分析,随后演示了如何利用CBNplot来展示感兴趣通路中的基因之间的调控关系。但是这种调控关系只是CBNplot基于基因在各样本之间的表达水平进行的预测,并不能代表实际存在的调控关系。在实际应用中,还需要根据CHIPseq,ATAC等实验数据进一步证实某两个基因之间有之间的相互作用。无论如何,预测的结果可能不是很完美但总归比没有好;基于此,我们还可以根据相关的生物学知识和文献检索先建立几个假说,最后再使用实验进行验证,
好啦,本期推文到这就结束啦,在下期的推文中,Immugent将会介绍如何使用CBNplot在通路水平进行展示。