第十四计 借尸还魂
迷信人认为人死后灵魂可附着于别人的尸体而复活。后用以比喻已经消灭或没落的事物,又假托别的名义或以另一种形式重新出现。此言兵法,是说兵家要善于抓住一切机会,甚至是看去无什用处的东西,努力争取主动,壮大自己,即时利用而转不利为有利,乃至转败为胜。
我们演示了如何通过基于规范性标记计算细胞周期阶段评分并在预处理过程中将其从数据中回归来减轻scRNA-seq数据中细胞周期异质性的影响。我们在鼠类造血祖细胞的数据集上进行了证明(Nestorowa等人,Blood 2016)。您可以在此处下载运行此小插图所需的文件。
library(Seurat)
# Read in the expression matrix The first row is a header row, the first column is rownames
exp.mat <- read.table(file = "../data/nestorawa_forcellcycle_expressionMatrix.txt", header = TRUE,
as.is = TRUE, row.names = 1)
# A list of cell cycle markers, from Tirosh et al, 2015, is loaded with Seurat. We can
# segregate this list into markers of G2/M phase and markers of S phase
s.genes <- cc.genes$s.genes
g2m.genes <- cc.genes$g2m.genes
# Create our Seurat object and complete the initalization steps
marrow <- CreateSeuratObject(counts = exp.mat)
marrow <- NormalizeData(marrow)
marrow <- FindVariableFeatures(marrow, selection.method = "vst")
marrow <- ScaleData(marrow, features = rownames(marrow))
如果我们使用FindVariableFeatures()`上面发现的可变基因在我们的对象上运行PCA ,我们会看到,虽然大多数差异可以通过谱系来解释,但PC8和PC10会在包括TOP2A和MKI67的细胞周期基因上分裂。我们将尝试从数据中回归该信号,以使细胞周期异质性不会有助于PCA或下游分析。
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), ndims.print = 6:10, nfeatures.print = 10)
## PC_ 6
## Positive: SELL, ARL6IP1, CCL9, CD34, ADGRL4, BPIFC, NUSAP1, FAM64A, CD244, C030034L19RIK
## Negative: LY6C2, AA467197, CYBB, MGST2, ITGB2, PF4, CD74, ATP1B1, GP1BB, TREM3
## PC_ 7
## Positive: HDC, CPA3, PGLYRP1, MS4A3, NKG7, UBE2C, CCNB1, NUSAP1, PLK1, FUT8
## Negative: F13A1, LY86, CFP, IRF8, CSF1R, TIFAB, IFI209, CCR2, TNS4, MS4A6C
## PC_ 8
## Positive: NUSAP1, UBE2C, KIF23, PLK1, CENPF, FAM64A, CCNB1, H2AFX, ID2, CDC20
## Negative: WFDC17, SLC35D3, ADGRL4, VLDLR, CD33, H2AFY, P2RY14, IFI206, CCL9, CD34
## PC_ 9
## Positive: IGKC, JCHAIN, LY6D, MZB1, CD74, IGLC2, FCRLA, IGKV4-50, IGHM, IGHV9-1
## Negative: SLC2A6, HBA-A1, HBA-A2, IGHV8-7, FCER1G, F13A1, HBB-BS, PLD4, HBB-BT, IGFBP4
## PC_ 10
## Positive: CTSW, XKRX, PRR5L, RORA, MBOAT4, A630014C17RIK, ZFP105, COL9A3, CLEC2I, TRAT1
## Negative: H2AFX, FAM64A, ZFP383, NUSAP1, CDC25B, CENPF, GBP10, TOP2A, GBP6, GFRA1
DimHeatmap(marrow, dims = c(8, 10))
分配细胞周期分数
首先,我们根据每个细胞的G2 / M和S期标志物的表达为其分配分数。这些标记物组在表达水平上应该是反相关的,并且不表达它们的细胞可能不会循环并处于G1期。
我们在CellCycleScoring()函数中分配分数,该函数将S和G2 / M分数存储在对象元数据中,以及G2M,S或G1阶段中每个单元格的预测分类。CellCycleScoring()也可以通过传递将Seurat对象的标识设置为细胞周期阶段set.ident = TRUE
(原始标识存储为old.ident
)。请注意,Seurat在下游细胞周期回归中不使用离散分类(G2M / G1 / S)。相反,它使用G2M和S期的定量评分。但是,如果有兴趣,我们会提供预测的分类。
marrow <- CellCycleScoring(marrow, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE)
# view cell cycle scores and phase assignments
head(marrow[[]])
## orig.ident nCount_RNA nFeature_RNA S.Score G2M.Score Phase
## Prog_013 Prog 2563089 10211 -0.14248691 -0.4680395 G1
## Prog_019 Prog 3030620 9991 -0.16915786 0.5851766 G2M
## Prog_031 Prog 1293487 10192 -0.34627038 -0.3971879 G1
## Prog_037 Prog 1357987 9599 -0.44270212 0.6820229 G2M
## Prog_008 Prog 4079891 10540 0.55854051 0.1284359 S
## Prog_014 Prog 2569783 10788 0.07116218 0.3166073 G2M
## old.ident
## Prog_013 Prog
## Prog_019 Prog
## Prog_031 Prog
## Prog_037 Prog
## Prog_008 Prog
## Prog_014 Prog
# Visualize the distribution of cell cycle markers across
RidgePlot(marrow, features = c("PCNA", "TOP2A", "MCM6", "MKI67"), ncol = 2)
# Running a PCA on cell cycle genes reveals, unsurprisingly, that cells separate entirely by
# phase
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow)
我们根据Tirosh等人描述的评分策略对单个细胞评分。2016。有关`?AddModuleScore()更多信息,请参见Seurat。此功能可用于计算任何基因列表的监督模块评分。
在数据缩放期间回归出细胞周期得分
现在,我们尝试从数据中减去(“回归”)这种异质性来源。对于Seurat v1.4的用户,此功能已在中实现RegressOut
。但是,由于此过程的结果存储在缩放后的数据槽中(因此将覆盖的输出ScaleData()),因此我们现在将此功能合并到ScaleData()`函数本身中。
对于每个基因,Seurat建模基因表达与S和G2M细胞周期得分之间的关系。该模型的比例残差表示“校正”的表达矩阵,可将其用于下游以进行尺寸缩减。
marrow <- ScaleData(marrow, vars.to.regress = c("S.Score", "G2M.Score"), features = rownames(marrow))
# Now, a PCA on the variable genes no longer returns components associated with cell cycle
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
## PC_ 1
## Positive: BLVRB, CAR2, KLF1, AQP1, CES2G, ERMAP, CAR1, FAM132A, RHD, SPHK1
## Negative: TMSB4X, H2AFY, CORO1A, PLAC8, EMB, MPO, PRTN3, CD34, LCP1, BC035044
## PC_ 2
## Positive: ANGPT1, ADGRG1, MEIS1, ITGA2B, MPL, DAPP1, APOE, RAB37, GATA2, F2R
## Negative: LY6C2, ELANE, HP, IGSF6, ANXA3, CTSG, CLEC12A, TIFAB, SLPI, ALAS1
## PC_ 3
## Positive: APOE, GATA2, NKG7, MUC13, MS4A3, RAB44, HDC, CPA3, FCGR3, TUBA8
## Negative: FLT3, DNTT, LSP1, WFDC17, MYL10, GIMAP6, LAX1, GPR171, TBXA2R, SATB1
## PC_ 4
## Positive: CSRP3, ST8SIA6, DNTT, MPEG1, SCIN, LGALS1, CMAH, RGL1, APOE, MFSD2B
## Negative: PROCR, MPL, HLF, MMRN1, SERPINA3G, ESAM, GSTM1, D630039A03RIK, MYL10, LY6A
## PC_ 5
## Positive: CPA3, LMO4, IKZF2, IFITM1, FUT8, MS4A2, SIGLECF, CSRP3, HDC, RAB44
## Negative: PF4, GP1BB, SDPR, F2RL2, RAB27B, SLC14A1, TREML1, PBX1, F2R, TUBA8
# When running a PCA on only cell cycle genes, cells no longer separate by cell-cycle phase
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow)
由于最佳的细胞周期标记在组织和物种之间极为保守,因此我们发现此程序可在各种数据集上稳定可靠地工作。
备用工作流程
上面的过程将删除与细胞周期相关的所有信号。在某些情况下,我们发现这会对下游分析产生负面影响,尤其是在分化过程中(如鼠类造血),在此过程中干细胞处于静止状态,分化的细胞正在增殖(反之亦然)。在这种情况下,将所有细胞周期效应消退,也会使干细胞和祖细胞之间的区别变得模糊。
作为替代方案,我们建议逐步淘汰G2M和S期评分之间的差异。这意味着将保持分离非循环细胞和循环细胞的信号,但是增殖细胞之间的细胞周期相位差异(通常是无趣的)将被从数据中剔除。
marrow$CC.Difference <- marrow$S.Score - marrow$G2M.Score
marrow <- ScaleData(marrow, vars.to.regress = "CC.Difference", features = rownames(marrow))
# cell cycle effects strongly mitigated in PCA
marrow <- RunPCA(marrow, features = VariableFeatures(marrow), nfeatures.print = 10)
## PC_ 1
## Positive: BLVRB, KLF1, ERMAP, FAM132A, CAR2, RHD, CES2G, SPHK1, AQP1, SLC38A5
## Negative: TMSB4X, CORO1A, PLAC8, H2AFY, LAPTM5, CD34, LCP1, TMEM176B, IGFBP4, EMB
## PC_ 2
## Positive: APOE, GATA2, RAB37, ANGPT1, ADGRG1, MEIS1, MPL, F2R, PDZK1IP1, DAPP1
## Negative: CTSG, ELANE, LY6C2, HP, CLEC12A, ANXA3, IGSF6, TIFAB, SLPI, MPO
## PC_ 3
## Positive: APOE, GATA2, NKG7, MUC13, ITGA2B, TUBA8, CPA3, RAB44, SLC18A2, CD9
## Negative: DNTT, FLT3, WFDC17, LSP1, MYL10, LAX1, GIMAP6, IGHM, CD24A, MN1
## PC_ 4
## Positive: CSRP3, ST8SIA6, SCIN, LGALS1, APOE, ITGB7, MFSD2B, RGL1, DNTT, IGHV1-23
## Negative: MPL, MMRN1, PROCR, HLF, SERPINA3G, ESAM, PTGS1, D630039A03RIK, NDN, PPIC
## PC_ 5
## Positive: HDC, LMO4, CSRP3, IFITM1, FCGR3, HLF, CPA3, PROCR, PGLYRP1, IKZF2
## Negative: GP1BB, PF4, SDPR, F2RL2, TREML1, RAB27B, SLC14A1, PBX1, PLEK, TUBA8
# when running a PCA on cell cycle genes, actively proliferating cells remain distinct from G1
# cells however, within actively proliferating cells, G2M and S phase cells group together
marrow <- RunPCA(marrow, features = c(s.genes, g2m.genes))
DimPlot(marrow)