GSEA is designed to analyze ranked lists of all available genes and does not require a threshold.
Y叔的clusterProfiler做GSEA分析超级方便~~
参数解读:
- GSEA searches for pathways whose genes are enriched at the top or bottom of the ranked gene list, more so than expected by chance alone.
- To calculate an enrichment score (ES) for a pathway, GSEA progressively examines genes from the top to the bottom of the ranked list, increasing the ES if a gene is part of the pathway and decreasing the score otherwise.
- These running sum values are weighted, so that enrichment in the very top- (and bottom-) ranking genes is amplified, whereas enrichment in genes with more moderate ranks are not amplified.
- The ES score is calculated as the maximum value of the running sum and normalized relative to pathway size, resulting in a normalized enrichment score (NES) that reflects the enrichment of the pathway in the list. Positive and negative NES values represent enrichment at the top and bottom of the list, respectively.
- Finally, a permutation-based P value is computed and corrected for multiple testing to produce a permutation-based false-discovery rate (FDR) Q value that ranges from 0 (highly significant) to 1 (not significant)
- Resulting pathways are selected using the FDR Q value threshold (e.g., Q < 0.05) and ranked using NES. In addition, the ‘leading edge’ aspect of the GSEA analysis identifies specific genes that most strongly contribute to the detected enrichment signal of a pathway.
从以上参数解读了解到,结果可通过 FDR q-value筛选,利用NES进行排序解读。
# Select gene set form msigdf
msigdf.human <-msigdf::msigdf.human
GeneS<-msigdf.human[grepl("LYSOSOM",msigdf.human$geneset),]%>% select(geneset, symbol) %>% as.data.frame
# gene.rank is a data.frame, which contains Gene and FC (Fold-Change)
gene.rank
gene.sort <- arrange(gene.rank, desc(logFC)) # 降序排列
geneList1<-gene.sort$FC
names(geneList1)<-gene.sort$Gene # 命名
# GSEA
gsea<- GSEA(geneList, TERM2GENE = GeneS, verbose=FALSE, pvalueCutoff = 0.5)
# Note: 1. 结果是不断变的,原因是by="fgsea",可以将nPerm设置的大一些,默认是1000.
# 2. 原始文献中的选择:p value < 0.05, FDR < 0.25,可结合自己需求调整。
# 可视化第一个结果,gseaplot2中by函数可以选择部分结果展示,如"preranked".
gseaplot2(gsea,geneSetID=1,title=gsea$Description[1],pvalue_table=T)
# geneSetID多选择几个,即可实现多个gene set的可视化。