题目:Undulating changes in human plasma proteome profiles across the lifespan
主要是学习他的数据分析方式,所以不对introduction进行解读。
- Linear modeling links the plasma proteome to functional aging and identifies a conserved aging signature.
Result中第一部分主要包括三部分分析:1. 回归分析;2. Sliding enrichment pathway analysis (SEPA)。
-
线性模型,以下这几张图:
文中对这一部分的描述如下:
我利用自己的数据模仿进行了以上的分析,但没有绘图:
library(emmeans) # 计算 effect size
library(car) # 用于type2型ANOVA
attach(lmdataclin)
lm.function<-function(x){
# calculate
lm.pro<-lm(x~AGE+SEX,data=lmdataclin)
x.age<-lm.pro$coefficients[2]
y.age<-summary(lm.pro)$"coefficients"[2,4]
x.sex<-lm.pro$coefficients[3]
y.sex<-summary(lm.pro)$"coefficients"[3,4]
mm<-Anova(lm.pro,type=2)
eta_sq(mm,partial=TRUE)
partial.age<-eta_sq(mm,partial=TRUE)[1,2]
partial.sex<-eta_sq(mm,partial=TRUE)[2,2]
lm.list<-data.frame(ß.age=x.age, pvalue.age=y.age, ß.sex=x.sex,pvalue.sex=y.sex,effect.size.age=partial.age,effect.size.sex=partial.sex)
return(lm.list)
}
aov.pro<-apply(lmdataclin[,100:651],2,lm.function) # 批量处理
sum.aov<-do.call(rbind,lm.pro) # 进行合并
结果如下:
p.value的校正可参考:
https://blog.csdn.net/zhu_si_tao/article/details/71077703?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task
具体计算:
p.adjust(sum.aov$pvalue.age,method="BH",n=length(sum.aov$pvalue.age)) # BH就是FDR校正
- Sliding Enrichment pathway analysis (SEPA)
方法中对SEPA的介绍如下:
涉及到了GSEA的内容,可参考以下内容:
https://www.jianshu.com/p/e28783bdd092
https://www.jianshu.com/p/b409a5576ce1
我下载了补充材料, 里面有2025个蛋白的信息,进行尝试。
Method 1. clusterProfiler运行之后,和报道的结果不太一致;
Gene.back<-str_match(data$EntrezGeneSymbol,"[A-Z0-9]*") %>% as.vector
Gene.back<-bitr(Gene.back, fromType = "SYMBOL",toType = "ENTREZID",OrgDb = org.Hs.eg.db)
Gene.back<-Gene.back$ENTREZID
Gene.back[duplicated(Gene.back)]
data.rank.up<-data[data$Coefficient.Age>0,] %>% .[1:100,]
Gene.rank.up<-str_match(data.rank.up$EntrezGeneSymbol," [A-Z0-9]*") %>% as.vector
Gene.rank.up<-bitr(Gene.rank.up, fromType = "SYMBOL",toType = "ENTREZID",OrgDb = org.Hs.eg.db)
Gene.rank.up<-Gene.rank.up$ENTREZID
Gene.rank.up[duplicated(Gene.rank.up)]
go.up<- enrichGO(gene = Gene.rank.up,
OrgDb=org.Hs.eg.db,
ont = "ALL", # MF,BP,CC
pAdjustMethod = "BH",
minGSSize = 1,
maxGSSize = 500,
pvalueCutoff = 1,
qvalueCutoff = 1,
readable = TRUE,
universe = Gene.back,
keyType = "ENTREZID")
Method 2. gprofiler2,没有结果。
library(ggplot2)
library(gprofiler2)
Gene.rank.up.gp<- str_match(data.rank.up$EntrezGeneSymbol,"[A-Z0-9]*") %>% na.omit%>%unique %>% as.vector
Gene.back.gp<-str_match(data$EntrezGeneSymbol,"[A-Z0-9]*") %>% na.omit%>%unique %>% as.vector
# query type: Gene symbol
gostres <- gost(query=Gene.rank.up.gp,organism = "hsapiens", ordered_query = TRUE,
multi_query = FALSE, significant = TRUE, exclude_iea = FALSE,
measure_underrepresentation = FALSE, evcodes = FALSE,
user_threshold = 0.05, correction_method = "fdr",
domain_scope = "annotated", #custom_bg = Gene.back.gp,
numeric_ns = "", sources = c("GO:BP") , as_short_link = FALSE)
p <- gostplot(gostres)
暂时不知道问题出在哪,找到方法后再补充~
-
Prediction of human biological age using the plasma proteome.
作者是用glmnet包做的,目前这个包已经更新,需要3.6.0版本以上的R才可以安装~
未完,待续.....