【单细胞】轨迹分析-slingshot

轨迹推断/分析是研究细胞动力学的重要方法，它是单细胞研究应用中一种常见的手段。目前已经开发出了大量的轨迹分析算法，最常用是monocle系列（前面我们介绍的也有），这些算法都可以很好的计算细胞之间的表达模式并模拟分化过程。但是由于数据集之间各有差异，单一的分析方法并不能满足所有数据需求。因此，使用更多的方法有助于我们找到最适合自己数据集的方法。今天，我们分享另外一个分析方法-Slingshot。

Slingshot是2018年由加州大学的开发的一个用于推断细胞谱系分化结构和顺序的工具，它由两个主要阶段组成：谱系结构的推断，以及每个谱系细胞的伪时间变化的推断。Slingshot结合了复杂单细胞数据所需的高度稳定技术以及识别具有不同监督水平的多种谱系的灵活性，使得Slingshot不拘泥于上游选择，而是在设计时考虑到了灵活性和模块化，以便轻松地适用于多种数据标准化、降维和聚类方法。

Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics

Slingshot工具是一个R包，运行需要输入包含用于谱系推断的坐标矩阵的数据对象。支持的类型包括：matrix、SingleCellExperiment、SlingshotDataSet、PseudotimeOrdering。对象中同时需要包含降维数据矩阵（类似基因表达矩阵）和聚类标签向量(聚类赋值矩阵，类似seurat对象中的meta.data数据），由于Slingshot工具的灵活性，我们可以使用矩阵数据进行降维聚类后分析，也可以直接使用降维聚类完成的Seurat对象进行分析。

=======软件测试======

1. 导入需要的R包

library(slingshot)
library(Seurat)
library(devtools)
library(cowplot)
library(ggplot2)
library(Matrix)
library(dplyr)
library(tradeSeq)
library(RColorBrewer)
library(DelayedMatrixStats)
library(scales)
library(paletteer) 
library(viridis)

2. 加载输入文件

我们使用一个老鼠的测试数据

mouse_data <- readRDS("mouse_data.rds")

image.png

对于Seurat的单细胞数据,我们可以直接转化成SingleCellExperiment

sce <- as.SingleCellExperiment(mouse_data, assay = "RNA") #assay的选择要根据自己的数据

注：Slingshot可以选择起点和终点，但是一般起点的选择要有一定的生物学意义

sce_slingshot1 <- slingshot(sce,      #输入单细胞对象
                     reducedDim = 'UMAP',  #降维方式
                     clusterLabels = sce$celltype,  #cell类型
                     start.clus = 'PMN(0)',       #轨迹起点,也可以不定义
                     approx_points = 150)

通过SlingshotDataSet可以直接查看轨迹信息：

SlingshotDataSet(sce_slingshot1)

从结果可以看出，这个数据推断出了3条轨迹。

image.png

3. 可视化轨迹

定义颜色

cell_pal <- function(cell_vars, pal_fun,...) {
  if (is.numeric(cell_vars)) {
    pal <- pal_fun(100, ...)
    return(pal[cut(cell_vars, breaks = 100)])
  } else {
    categories <- sort(unique(cell_vars))
    pal <- setNames(pal_fun(length(categories), ...), categories)
    return(pal[cell_vars])
  }
}

设置颜色并可视化。这里分组是celltype。在cell type上可视化轨迹

cell_colors <- cell_pal(sce_slingshot1$celltype, brewer_pal("qual", "Set2"))
plot(reducedDims(sce_slingshot1)$UMAP, col = cell_colors, pch=16, asp = 1, cex = 0.8)
lines(SlingshotDataSet(sce_slingshot1), lwd=2, col='black')

#计算celltype坐标位置，用于图中标记
celltype_label <- mouse_data@reductions$umap@cell.embeddings%>% 
  as.data.frame() %>%
  cbind(celltype = mouse_data@meta.data$celltype) %>%
  group_by(celltype) %>%
  summarise(UMAP1 = median(UMAP_1),
            UMAP2 = median(UMAP_2))

for (i in 1:8) {
  text(celltype_label$celltype[i], x=celltype_label$UMAP1[i]-1, y=celltype_label$UMAP2[i])
}

image.png

还可以按样本来着色

group_colors <- cell_pal(sce_slingshot1$orig.ident, brewer_pal("qual", "Set2"))
plot(reducedDims(sce_slingshot1)$UMAP, col = group_colors, pch=16, asp = 1, cex = 0.8)
lines(SlingshotDataSet(sce_slingshot1), lwd=2, col='black')
#添加legend即可
legend("top",
       pch=15,
       legend=unique(sce$orig.ident),
       bty="n",
       col =unique(group_colors),
       border="black",
       horiz=T)

image.png

从前面的图上可以看出，总共有三条轨迹。所以也可以分别可视化三条轨迹。

plot(reducedDims(sce_slingshot1)$UMAP, asp = 1, pch = 16, xlab = "UMAP_1", ylab = "UMAP_2",
     col = hcl.colors(100, alpha = 0.5)[cut(sce_slingshot1$slingPseudotime_1, breaks = 100)])

lines(SlingshotDataSet(sce_slingshot1), linInd = 1, lwd = 2, col = 'black')

image.png

4. 密度图

也可以像monocle一样画拟时密度图，展示细胞或者样本分组在拟时轨迹的分布。这里我们以第1条轨迹为示例：

首先这里查看有哪些轨迹，轨迹中包含哪些细胞

SlingshotDataSet(sce_slingshot1)

image.png

从结果可以看出，第一条轨迹包含5种cell type。

density1 <- list(a_1 = density(slingPseudotime(sce_slingshot1)[colData(sce_slingshot1)$celltype == "PMN(0)", 1], na.rm = T), 
           a_2 = density(slingPseudotime(sce_slingshot1)[colData(sce_slingshot1)$celltype == "PMN(2)", 1], na.rm = T),
           a_3 = density(slingPseudotime(sce_slingshot1)[colData(sce_slingshot1)$celltype == "PMN(5)", 1], na.rm = T),
           a_4 = density(slingPseudotime(sce_slingshot1)[colData(sce_slingshot1)$celltype == "PMN(1)", 1], na.rm = T),
           a_5 = density(slingPseudotime(sce_slingshot1)[colData(sce_slingshot1)$celltype == "PMN(4)", 1], na.rm = T))

#作图范围
xlim <- range(c(density1$a_1$x, density1$a_2$x, density1$a_3$x, density1$a_4$x,density1$a_5$x))
ylim <- range(c(density1$a_1$y, density1$a_2$y, density1$a_3$y, density1$a_4$y, density1$a_5$y))

par(mar=c(6, 6, 6, 6), xpd = TRUE)
plot(xlim, ylim, col = "white", xlab = "Pseudotime", ylab = "")  #基本作图
#添加密度图
for(i in 1:5) {
  polygon(c(min(density1[[i]]$x),density1[[i]]$x, max(density1[[i]]$x)), c(0, density1[[i]]$y, 0),
          col = alpha(brewer.pal(5, "Set1")[i], alpha = 0.5))
}

legend("right",
       inset= -0.2,#legend位于画框右侧正中
       pch=15,
       legend=c("PMN(0)","PMN(2)","PMN(5)","PMN(1)","PMN(4)"),
       bty="n",
       col = alpha(brewer.pal(5, "Set1"), alpha = 0.5),
       border="black",
       title = "celltype",
       cex = 0.8)

image.png

除了细胞类型外，也可以看样本的密度图

density1_s <- list(a_1 = density(slingPseudotime(sce_slingshot1)[colData(sce_slingshot1)$orig.ident == "10X_ntph_F", 1], na.rm = T), 
                 a_2 = density(slingPseudotime(sce_slingshot1)[colData(sce_slingshot1)$orig.ident == "10X_ntph_M", 1], na.rm = T))

#作图范围
xlim <- range(c(density1_s$a_1$x, density1_s$a_2$x))
ylim <- range(c(density1_s$a_1$y, density1_s$a_2$y))

par(mar=c(6, 6, 6, 6), xpd = TRUE)
plot(xlim, ylim, col = "white", xlab = "", ylab = "")#基本作图
#添加密度图
polygon(c(min(density1_s[[1]]$x),density1_s[[1]]$x, max(density1_s[[1]]$x)), c(0, density1_s[[1]]$y, 0),
        col = alpha(brewer.pal(7, "Set1")[6], alpha = 0.5))
polygon(c(min(density1_s[[2]]$x),density1_s[[2]]$x, max(density1_s[[2]]$x)), c(0, density1_s[[2]]$y, 0),
        col = alpha(brewer.pal(7, "Set1")[7], alpha = 0.5))

legend("right",
       inset= -0.3,
       pch=15,
       legend=c("10X_ntph_F","10X_ntph_M"),
       bty="n",
       col = brewer.pal(8, "Set1")[7:8],
       border="black",
       title = "group",
       cex = 0.8)

image.png

为了方便操作，也可以把slingshot结果放到单细胞seurat对象中。例如，这里我们将三个轨迹的pseudotime添加到单细胞对象中（Metadata）。

pseudotime = slingPseudotime(sce_slingshot1)%>% as.data.frame() 
Lineages = colnames(pseudotime)

for(i in 1:3){
  pseudotime_sub <- pseudotime[,i]
  mouse_data <- AddMetaData(object = mouse_data,
                   metadata = pseudotime_sub,
                   col.name = Lineages[i])
}

可以看出，mouse_data中添加了轨迹信息：

image.png

这样我们就可以使用ggplot做密度图了。我们首先定义一个函数，让作图更加方便一些。

pseudotime_density <- function(seurat_obj,
                               Lineage,
                               cluster_label,
                               color){
  df <- data.frame(seurat_obj[[cluster_label]], seurat_obj[[Lineage]]) 
  colnames(df) <- c("celltype", "Lineage")
  na.omit(df) -> df
  p <- ggplot(df, aes(x=Lineage, fill=celltype)) +
    geom_density(alpha=0.5) + 
    theme_bw()+
    scale_fill_manual(values = color)
 # print(p)
  
}

library(dittoSeq)
p1 <- pseudotime_density(mouse_data, Lineage="Lineage1", cluster_label = "celltype", color = dittoColors())+
  theme(axis.title.x  = element_blank())
p2 <- pseudotime_density(mouse_data, Lineage="Lineage1", cluster_label = "orig.ident", color = c("#DC050C","#8DD3C7"))

#拼图
library(patchwork)
p1+p2+plot_layout(ncol = 1,heights=c(1,1),guides = 'collect')

image.png

5. 基因随轨迹表达变化

像monocle2一样，也可以展示基因在不同轨迹随拟时表达变化

slingsce<-SlingshotDataSet(sce_slingshot1)
pseudotimeED <- slingPseudotime(slingsce, na = FALSE)
cellWeightsED <- slingCurveWeights(slingsce)
counts<-sce_slingshot1@assays@data@listData$counts

#这里用的书所有的基因
sce_slinghot <- fitGAM(counts = counts, pseudotime = pseudotimeED, cellWeights = cellWeightsED, nknots = 5, verbose = T)
mean(rowData(sce_slinghot)$tradeSeq$converged)
rowData(sce_slinghot)$assocRes <- associationTest(sce_slinghot, lineages = TRUE, l2fc = log2(2))

assocRes <- rowData(sce_slinghot)$assocRes

#注：fitGAM那个步骤很慢，我这里差不多跑了1个小时。我查看了一下网上的建议，不需要用所有的基因，用那些high variable的gene就可以

下边，我们开始作图：

gene_dynamic <- list()
genes_plot <- c("S100a8","Anxa1","Ly6g","S100a9")
for(i in 1:length(genes_plot)){
  p = plotSmoothers(sce_slinghot, assays(sce_slinghot)$counts,
                    gene =genes_plot[i], alpha = 0.6, border = T, lwd = 2)+
    ggtitle(genes_plot[i])
  gene_dynamic[[i]] <- p
}

Seurat::CombinePlots(gene_dynamic, ncol = 2)

image.png

6. 轨迹基因热图

前面推断了轨迹，并进行了一些简单的可视化。和monocle2一样，关注的还是随轨迹的基因变化。可以借助tradeSeq包进行分析。

这里示例分析一下轨迹1,将轨迹1的拟时添加到Seurat对象

sce_slingshot1$sling_pseudotime = sce_slingshot1[[paste0("slingPseudotime_1")]]
mouse_data$sling_pseudotime = sce_slingshot1$sling_pseudotime

#仅分析在轨迹拟时中的细胞，去除NA
sce_slingshot1_l1 = sce_slingshot1[,!is.na(sce_slingshot1$sling_pseudotime)]
seur = mouse_data[,!is.na(mouse_data$sling_pseudotime)]

#和前面一样这一步会比较慢，这个数据跑了大概1个小时。
sce_slingshot1_l1 <- fitGAM(counts(sce_slingshot1_l1), 
                            cellWeights = rep(1, ncol(sce_slingshot1_l1)), 
                            pseudotime = sce_slingshot1_l1$sling_pseudotime)

ATres <- associationTest(sce_slingshot1_l1)
association_test_tab = as_tibble(cbind(gene = rownames(ATres), ATres))

基因随拟时的分析完成后，我们需要利用Seurat对象和上面的分析结果构建拟时热图matrix，构建一个函数，用来获取matrix。

slingshot_for_plotMatrix <- function(seurat_obj,
                                     n_bins,#拟时需要分割的区间，将相似的拟时区间合并，这类似于我们monocle3中的方式
                                     min_exp)
                                     {
  seurat_meta = seurat_obj@meta.data
  seurat_meta = as_tibble(cbind(cell.id = as.character(rownames(seurat_meta)), seurat_meta))
  seurat_meta = seurat_meta[order(seurat_meta$sling_pseudotime),]
  
  pl_cells = as.character(seurat_meta$cell.id)
  
  #提取表达矩阵,并将cell id的exp排序与前面排序好的cell id一致
  exp = seurat_obj@assays$RNA@data
  exp = exp[,colnames(exp) %in% pl_cells]
  expr_mat = exp[,order(match(colnames(exp), pl_cells))]
  
  expr_mat = as.matrix(expr_mat[rownames(expr_mat) %in% association_test_tab$gene,])
  
  clust_expr_mat = matrix(nrow = nrow(expr_mat), 
                          ncol = n_bins, dimnames = list(rownames(expr_mat), 1:n_bins))
  
  max_pseudotime = max(seurat_meta$sling_pseudotime)
  pseudotime_bin_size = max_pseudotime/n_bins
  
  pseudotime_cluster_stat = NULL
  seurat_obj$pseudotime_bin = NA_integer_
  
  for (i in 1 : n_bins){
    
    bin_cells = seurat_meta$cell.id[(seurat_meta$sling_pseudotime > (i-1)*pseudotime_bin_size & 
                                       seurat_meta$sling_pseudotime <= i*pseudotime_bin_size)]

    
    seurat_obj$pseudotime_bin[colnames(seurat_obj) %in% bin_cells] = i

    #计算基因平均表达量
    if (length(bin_cells)>10){
      m2 = expr_mat[,colnames(expr_mat) %in% bin_cells]
      clust_expr_mat[,i] = apply(m2, 1, mean, na.rm = TRUE)
    }

  }
  
  #数据缩放一下，为了更好的展现热图，并删除低表达基因
  mm1 = clust_expr_mat - apply(clust_expr_mat, 1, mean, na.rm = TRUE)
  mm2 = mm1[apply(abs(mm1),1, max, na.rm = TRUE)>min_exp,]
  
  return(mm2)
  
 }

获取拟时热图矩阵：

mm = slingshot_for_plotMatrix(seurat_obj = seur, n_bins = 20, min_exp = 0.2)

最后用pheatmap作图，和前面使用monocle2类似。

max_range = max(range(is.finite(mm)))
lim = c(-max_range, max_range)

library(pheatmap)
heatmap1 = pheatmap(mm, show_rownames=F, cluster_rows = TRUE,
                    cluster_cols = FALSE, show_colnames = FALSE, 
                    clustering_distance_rows = "euclidean",
                    clustering_method = "ward.D2",
                    treeheight_row = 50,
                    cutree_rows = 5, 
                    color = colorRampPalette(rev(brewer.pal(9, "PRGn")))(250),
                    breaks = seq(lim[1]/4, lim[2]/4, length.out = 251),
                    border_color = NA)

image.png

也可以像以前一样提取出来，修一修。

annotation_row <- data.frame(Cluster=factor(cutree(heatmap1$tree_row, 5)))
row.names(annotation_row) <- rownames(mm)


rowcolor <- c("#85B22E","#E29827","#922927",'#57C3F3','#A758E5')
names(rowcolor) <- c("1","2","3","4","5") #类型颜色

ann_colors <- list(pseudotime=viridis(100),
                  Cluster=rowcolor) #颜色设置
                    
pheatmap(mm,
        useRaster = T,
        cluster_cols=F,
        cluster_rows=T,
        show_rownames=F,
        show_colnames=F,
        clustering_method = "ward.D2",
        clustering_distance_rows = "euclidean",
        cutree_rows=5,
        border_color = NA,
        filename=NA,
        color = colorRampPalette(rev(brewer.pal(9, "PRGn")))(100),
        breaks = seq(lim[1], lim[2], length.out = 100),
        annotation_colors=ann_colors,
        annotation_row = annotation_row
)

image.png

也可以提取上面的热图聚类信息，提取每个module基因，像monocle2一样做GO/KEGG富集分析，然后再重新画图

tree_module = cutree(heatmap1$tree_row, k=5)
tree_module = tibble(gene = names(tree_module), module = as.character(tree_module))
tree_module = tree_module[heatmap1$tree_row$order,]

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 206,214评论 6赞 481
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 88,307评论 2赞 382
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 152,543评论 0赞 341
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 55,221评论 1赞 279
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 64,224评论 5赞 371
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 49,007评论 1赞 284
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,313评论 3赞 399
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,956评论 0赞 259
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 43,441评论 1赞 300
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,925评论 2赞 323
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 38,018评论 1赞 333
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,685评论 4赞 322
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,234评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 30,240评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,464评论 1赞 261
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,467评论 2赞 352
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,762评论 2赞 345

【单细胞】轨迹分析-slingshot

=======软件测试======

1. 导入需要的R包

2. 加载输入文件

3. 可视化轨迹

4. 密度图

5. 基因随轨迹表达变化

6. 轨迹基因热图

推荐阅读更多精彩内容