TCGA+biomarker——风险因子关联图

简介：风险因子关联图常见于cox风险比例模型，常见于下图这种三图联合的展示形式，展示的是由cox风险模型区分出的高风险和低风险人群在人群比例、生存时间（生/死）和关注的基因表达（通常是参与模型构建的基因）上分布上的差异。当然在模型不涉及基因表达时，往往风险因子关联图仅显示上两部分。

image

风险因子关联图解

image

The green dots in the figure represent the surviving PTC patients, and the red dots represent the dead PTC patients. The dotted line represents the median value of risk score. The left side of the dotted line represents the low risk score group, and the right side of the dotted line represents the high risk score group. With the increase of risk score in PTC patients, the number of red dots increased gradually, and the number of dead PTC patients increased. It shows that the patients in high-risk group have poorer survival and higher risk of death.

上图：每个病人的风险预测值，按照从小到大排序。横向虚线表示风险值的中位数，由中位数风险值区分出两类群体：低风险人群（绿色）和高风险人群（红色）。

下图：按照预测风险值排序的患者与生存时间之间的关系，从图上可以大致看出低风险人群的生存时间比高风险人群稍高一些。其中，绿点代表活着的患者，红色代表过世的患者，从图中可以看出，高风险人群的死亡人数明显高于低风险人群。

如何绘制风险因子生存图？

以下代码来源于生信星球公共号的考虑生存时间的ROC曲线-timeROC教程，本文没有展示前期数据的整理过程，主要用于后期的模型构建和绘图。风险因子关联图往往是在风险模型构建完成后，对风险因子与生存、基因表达之间关系的直观展示。

rm(list = ls()) 
options(stringsAsFactors = F)
library(survival)

#载入数据，数据包括基本的临床信息和构建模型miRNA表达值
#高通量检测后会有很多差异miRNA，前期通过一系列分析进一步筛选出用于cox风险模型构建的miRNA
#这里默认已经完成了自变量的筛选，确定了8个miRNA用于cox风险比模型的构建
dat = read.csv("risk_data.csv",header = T)

#构建多因素cox回归模型
s=Surv(time, event) ~ miR31+miR196b+miR766+miR519a1+miR375+miR187+miR331+miR101
model <- coxph(s, data = dat )
summary(model,data=dat)

#使用Survival程序包的Predict函数，计算出每位患者的风险评
RiskScore<-predict(model,type = "risk")
names(RiskScore) = rownames(dat)

#开始绘制风险模型的生存点图
fp <- RiskScore
phe<-dat
fp_dat=data.frame(patientid=1:length(fp),fp=as.numeric(sort(fp)))
#添加风险分组，以风险评分的中位值将患者分为两组，大于中位值的 患者为高风险组，小于或等于中位值的患者为低风 险组
fp_dat$riskgroup= ifelse(fp_dat$fp>=median(fp_dat$fp),'high','low')

sur_dat=data.frame(patientid=1:length(fp),time=phe[names(sort(fp)),'time'],event=phe[names(sort(fp )),'event']) 
sur_dat$event=ifelse(sur_dat$event==0,'alive','death')
sur_dat$event=factor(sur_dat$event,levels = c("death","alive"))
exp_dat=dat[names(sort(fp)),(ncol(dat)-7):ncol(dat)]
#fp_dat用来绘制第一幅图
#sur_dat用来绘制第二幅图
#exp_dat用来绘制第三幅图

###第一个图
library(ggplot2)
p1=ggplot(fp_dat,aes(x=patientid,y=fp))+geom_point(aes(color=riskgroup))+
  scale_colour_manual(values = c("red","green"))+
  theme_bw()+labs(x="Patient ID(increasing risk score)",y="Risk score")+
  geom_hline(yintercept=median(fp_dat$fp),colour="black", linetype="dotted",size=0.8)+
  geom_vline(xintercept=sum(fp_dat$riskgroup=="low"),colour="black", linetype="dotted",size=0.8)
p1

#第二个图
p2=ggplot(sur_dat,aes(x=patientid,y=time))+geom_point(aes(col=event))+theme_bw()+
  scale_colour_manual(values = c("red","green"))+
   labs(x="Patient ID(increasing risk score)",y="Survival time(year)")+
   geom_vline(xintercept=sum(fp_dat$riskgroup=="low"),colour="black", linetype="dotted",size=0.8)
p2

#第三个图
library(pheatmap)
mycolors <- colorRampPalette(c("white", "green", "red"), bias = 1.2)(100)
tmp=t(scale(exp_dat))
tmp[tmp > 1] = 1
tmp[tmp < -1] = -1
p3=pheatmap(tmp,col= mycolors,show_colnames = F,cluster_cols = F)

#拼图实现三图联动
library(ggplotify)
plots = list(p1,p2,as.ggplot(as.grob(p3)))
library(gridExtra)
lay1 = rbind(c(rep(1,7)),c(rep(2,7)),c(rep(3,7))) #布局矩阵
grid.arrange(grobs = plots, layout_matrix = lay1, heigths = c(2,3,2),weights=c(10,10,10))

结果图大致如下：

image

往期回顾
TCGA+biomarker——常见结果展示
 TCGA+biomarker——Sample基线表
 TCGA+biomarker——单因素Cox回归
 TCGA+biomarker——多因素Cox回归
 TCGA+biomarker——Cox回归森林图
 TCGA+biomarker——Calibration curve
TCGA+biomarker——C-index
TCGA+biomarker——决策曲线分析法（DCA

更多内容可关注公共号“YJY技能修炼”~~~