10X单细胞空间联合分析之三----Spotlight

在我们之前的分享中，已经分享过很多的单细胞和空间联合分析的方法，有关Spotlight，之前也提到过，但是并没有很认真的对关注这个方法，这次的分享，就让我们来参透这个方法；这个软件目前已经发表，文献在SPOTlight:Seeded NMF regression to Deconvolute Spatial Transcriptomics Spots with Single-Cell Transcriptomes,发表于Nucleic acids research，影响因子11分以上。文章大家可以认真研读一下，算法也很经典。

软件联合分析的原理

SPOTlight is centered around a seeded non-negative matrix factorization (NMF) regression, initialized using cell-type marker genes, and nonnegative least squares (NNLS) to subsequently deconvolute ST capture locations (spots).
其中的几个概念我们需要了解；
（1）NMF，这个可以参考文章Non-Negative Matrix Factorization (NMF)。我们这里举一个简单的例子帮助大家理解：
矩阵分解的想法源于我们相信每个电影都是由某些元素组成的，而这些元素到底是什么，有多少，我们是不知道的。
而不同的用户出于个人喜好一定会对这些元素有个评价。
比如：
一个电影里如果只考虑爱情和动作两个因素。
一个用户也一定会有我给爱情打分多一点还是喜欢动作多一点。
如图：

图片.png

期望就是我们已知下面的矩阵，而期望分解出比较合理的上面两个矩阵。这样就可以用来预测别的用户对别的电影的看法了。把这个想法运用到我们的空间数据上，就可以了。
（2） nonnegative least squares (NNLS)，非负最小二乘法，这个可以参考文章【技术分享】非负最小二乘,算法很难，了解一下即可。

总而言之一句话，Spotlight利用单细胞数据的特征，deconvolute ST 数据，从而知道每种细胞类型在空间上的位置。

软件的优势

Using synthetic spots, simulating varying reference quantities and qualities, we confirmed high prediction accuracy also with shallowly sequenced or small-sized scRNA-seq reference datasets（准确度高，文献肯定都这么写）。We trained the NMF regression model with sample-matched or external datasets, resulting in accurate and sensitive spatial predictions。还有一点，Using SPOTlight to detect regional enrichment of immune cells and their co-localization with tumor and adjacent stroma provides an illustrative example in its flexible application spectrum and future potential in digital pathology.（免疫细胞和肿瘤细胞的共定位问题，值得关注）。

（重点）Successful integration of both data modalities could enable an in-depth study of tissue and organ architecture, elucidate cellular cross-talk, spatially track dynamic cell trajectories, and identify disease-specific interaction networks 。Intersecting cell-type-specific genes from scRNA-seq with ST capture sites previously identified local enrichments, sufficient to segment tumor sections into normal and cancerous areas.（这个优点大家要多注意下，单细胞分析得到的通讯啊，轨迹啊，在空间上又是什么状态呢？）

技术示意图

图片.png

我们来看一下这个软件对空间数据分解的过程：
1、最核心的部分，identify cell type-specific topic profiles used to deconvolute ST spots。
这里多说一句，第一步的做法相当于将我们的表达矩阵拆分成了两个矩阵，W和H。（理论部分 NMF(Non-negative matrix factorization)，即对于任意给定的一个非负矩阵V，其能够寻找到一个非负矩阵W和一个非负矩阵H，满足条件V=WH,从而将一个非负的矩阵分解为左右两个非负矩阵的乘积。*其中，V矩阵中每一列代表一个观测(observation)，每一行代表一个特征(feature)；W矩阵称为基矩阵，H矩阵称为系数矩阵或权重矩阵。这时用系数矩阵H代替原始矩阵，就可以实现对原始矩阵进行降维，得到数据特征的降维矩阵。）We set out to use NMF to obtain topic profiles due to its previous success in identifying biologically relevant gene expression programs, as well as its previous implementation in ST analysis。这一步大家着重理解一下，关于NMF。Importantly, the NMF is initialized by the two main matrices: the basis matrix (W) with unique cell type marker genes and weights, and the coefficient matrix (H) in which each row is initialized, specifying the corresponding relationship of a cell to a topic。
2、Factorization is then carried out using non-smooth NMF（因式分解）。promoting cell type-specific topic profiles, while reducing overfitting during training。
3、NNLS regression is used to map each spot’s transcriptome to a topic profile distribution using the unit-variance normalized ST count matrix and the basis matrix previously obtained。回归分析，注意这里用到的矩阵关系。
4、Lastly, NNLS is again applied to determine the weights for each cell type that best fit each spot’s topic profile by minimizing the residuals.
We use a minimum weight contribution threshold to determine which cell types are contributing to the profile of a given spot, also considering the possibility of partial contributions. NNLS also returns a measure of error along with the predicted cell proportions, allowing the user to estimate the reliability of predicted spot compositions

这个地方大家需要注意的是，矩阵的分解，以及分别进行回归，最后一起预测细胞类型的组成。

接下来就是一些运用了，包括文献本身的验证和空间数据上的运用，大家看看即可，肯定效果很不错，要不然发不出来，。

图片.png

最后我们来看一下代码部分：

SPOTlight
这就不照抄别人的代码了，我们只关注重点：spotlight_deconvolution
关于这个函数有几个参数必须要注意：
1、cluster_markers 注意这里需要的marker单细胞分析得到的，但是用多少，需要判断。
2、cl_n Object of integer indicating how many cells to keep from each cluster. If a cluster has n < cl_n then all cells will be selected, if it has more then cl_n will be sampled randomly,100 by default.
3、method Object of class character; Type of method to us to find W and H. Look at NMF package for the options and specifications, by default nsNMF（最重要的一步，因式分解的方法）。

等等等等

代码部分

The goal of SPOTlight is to provide a tool that enables the deconvolution of cell types and cell type proportions present within each capture locations comprising mixtures of cells, originally developed for 10X’s Visium - spatial trancsiptomics- technology, it can be used for all technologies returning mixtures of cells. SPOTlight is based on finding topic profile signatures, by means of an NMFreg model, for each cell type and finding which combination fits best the spot we want to deconvolute.

image.png

Graphical abstract

0.1 Installation

You can install the latest stable version from the GitHub repository SPOTlight with:

# install.packages("devtools")
devtools::install_github("https://github.com/MarcElosua/SPOTlight")
devtools::install_git("https://github.com/MarcElosua/SPOTlight")

Or the latest version in development by downloading the devel branch

devtools::install_github("https://github.com/MarcElosua/SPOTlight", ref = "devel")
devtools::install_git("https://github.com/MarcElosua/SPOTlight", ref = "devel")

0.2 Libraries

library(Matrix)
library(data.table)
library(Seurat)
library(SeuratData)
library(dplyr)
library(gt)
library(SPOTlight)
library(igraph)
library(RColorBrewer)

0.3 Load data

For the purpose of this tutorial we are going to use adult mouse brain data. The scRNAseq data can be downloaded here while the spatial data is the one put out publicly by 10X and the processed object can be downloaded using SeuratData as shown below.

Load single-cell reference dataset.

path_to_data <- system.file(package = "SPOTlight")
cortex_sc <- readRDS(glue::glue("{path_to_data}/allen_cortex_dwn.rds"))

Load Spatial data

if (! "stxBrain" %in% SeuratData::AvailableData()[, "Dataset"]) {
  # If dataset not downloaded proceed to download it
  SeuratData::InstallData("stxBrain")
}

# Load data
anterior <- SeuratData::LoadData("stxBrain", type = "anterior1")

0.4 Pre-processing

set.seed(123)
cortex_sc <- Seurat::SCTransform(cortex_sc, verbose = FALSE) %>%
  Seurat::RunPCA(., verbose = FALSE) %>%
  Seurat::RunUMAP(., dims = 1:30, verbose = FALSE)

Visualize the clustering

Seurat::DimPlot(cortex_sc,
                group.by = "subclass",
                label = TRUE) + Seurat::NoLegend()

image.png

0.5 Descriptive

cortex_sc@meta.data %>%
  dplyr::count(subclass) %>%
  gt::gt(.[-1, ]) %>%
  gt::tab_header(
    title = "Cell types present in the reference dataset",
  ) %>%
  gt::cols_label(
    subclass = gt::html("Cell Type")
  )

Cell types present in the reference dataset

---
Cell Type	n
---	---
Astro	70
CR	7
Endo	70
L2/3 IT	70
L4	70
L5 IT	70
L5 PT	70
L6 CT	70
L6 IT	70
L6b	70
Lamp5	70
Macrophage	51
Meis2	45
NP	70
Oligo	70
Peri	32
Pvalb	70
Serpinf1	27
SMC	55
Sncg	70
Sst	70
Vip	70
VLMC	67

0.6 Compute marker genes

To determine the most important marker genes we can use the function Seurat::FindAllMarkers which will return the markers for each cluster.

Seurat::Idents(object = cortex_sc) <- cortex_sc@meta.data$subclass
cluster_markers_all <- Seurat::FindAllMarkers(object = cortex_sc, 
                                              assay = "SCT",
                                              slot = "data",
                                              verbose = TRUE, 
                                              only.pos = TRUE)

saveRDS(object = cluster_markers_all,
        file = here::here("inst/markers_sc.RDS"))

0.6.1 SPOTlight Decomposition

set.seed(123)

spotlight_ls <- spotlight_deconvolution(
  se_sc = cortex_sc,
  counts_spatial = anterior@assays$Spatial@counts,
  clust_vr = "subclass", # Variable in sc_seu containing the cell-type annotation
  cluster_markers = cluster_markers_all, # Dataframe with the marker genes
  cl_n = 100, # number of cells per cell type to use
  hvg = 3000, # Number of HVG to use
  ntop = NULL, # How many of the marker genes to use (by default all)
  transf = "uv", # Perform unit-variance scaling per cell and spot prior to factorzation and NLS
  method = "nsNMF", # Factorization method
  min_cont = 0 # Remove those cells contributing to a spot below a certain threshold 
  )

saveRDS(object = spotlight_ls, file = here::here("inst/spotlight_ls.rds"))

Read RDS object

spotlight_ls <- readRDS(file = here::here("inst/spotlight_ls.rds"))

nmf_mod <- spotlight_ls[[1]]
decon_mtrx <- spotlight_ls[[2]]

0.6.2 Assess deconvolution

Before even looking at the decomposed spots we can gain insight on how well the model performed by looking at the topic profiles for the cell types.

The first thing we can do is look at how specific the topic profiles are for each cell type.

h <- NMF::coef(nmf_mod[[1]])
rownames(h) <- paste("Topic", 1:nrow(h), sep = "_")
topic_profile_plts <- SPOTlight::dot_plot_profiles_fun(
  h = h,
  train_cell_clust = nmf_mod[[2]])

topic_profile_plts[[2]] + ggplot2::theme(
  axis.text.x = ggplot2::element_text(angle = 90), 
  axis.text = ggplot2::element_text(size = 12))

image.png

0.7 Visualization

Join decomposition with metadata

# This is the equivalent to setting min_cont to 0.04
decon_mtrx_sub <- decon_mtrx[, colnames(decon_mtrx) != "res_ss"]
decon_mtrx_sub[decon_mtrx_sub < 0.08] <- 0
decon_mtrx <- cbind(decon_mtrx_sub, "res_ss" = decon_mtrx[, "res_ss"])
rownames(decon_mtrx) <- colnames(anterior)

decon_df <- decon_mtrx %>%
  data.frame() %>%
  tibble::rownames_to_column("barcodes")

anterior@meta.data <- anterior@meta.data %>%
  tibble::rownames_to_column("barcodes") %>%
  dplyr::left_join(decon_df, by = "barcodes") %>%
  tibble::column_to_rownames("barcodes")

0.7.1 Specific cell-types

we can use the standard Seurat::SpatialFeaturePlot to view predicted celltype proportions one at a time.

Seurat::SpatialFeaturePlot(
  object = anterior,
  features = c("L2.3.IT", "L6b", "Meis2", "Oligo"),
  alpha = c(0.1, 1))

图片.png

0.7.2 Spatial scatterpies

Plot spot composition of all the spots.

cell_types_all <- colnames(decon_mtrx)[which(colnames(decon_mtrx) != "res_ss")]

SPOTlight::spatial_scatterpie(se_obj = anterior,
                              cell_types_all = cell_types_all,
                              img_path = "sample_data/spatial/tissue_lowres_image.png",
                              pie_scale = 0.4)

image.png

Plot spot composition of spots containing cell-types of interest

SPOTlight::spatial_scatterpie(se_obj = anterior,
                              cell_types_all = cell_types_all,
                              img_path = "sample_data/spatial/tissue_lowres_image.png",
                              cell_types_interest = "L6b",
                              pie_scale = 0.8)

image.png

0.7.3 Spatial interaction graph

Now that we know which cell types are found within each spot we can make a graph representing spatial interactions where cell types will have stronger edges between them the more often we find them within the same spot. To do this we will only need to run the function get_spatial_interaction_graph, this function prints the plot and returns the elements needed to plot it.

graph_ntw <- SPOTlight::get_spatial_interaction_graph(decon_mtrx = decon_mtrx[, cell_types_all])

If you want to tune how the graph looks you can do the following or you can check out more options here:

deg <- degree(graph_ntw, mode="all")

# Get color palette for difusion
edge_importance <- E(graph_ntw)$importance

# Select a continuous palette
qual_col_pals <- brewer.pal.info[brewer.pal.info$category == 'seq',]

# Create a color palette
getPalette <- colorRampPalette(brewer.pal(9, "YlOrRd"))

# Get how many values we need
grad_edge <- seq(0, max(edge_importance), 0.1)

# Generate extended gradient palette dataframe
graph_col_df <- data.frame(value = as.character(grad_edge),
                           color = getPalette(length(grad_edge)),
                           stringsAsFactors = FALSE)
# Assign color to each edge
color_edge <- data.frame(value = as.character(round(edge_importance, 1)), stringsAsFactors = FALSE) %>%
  dplyr::left_join(graph_col_df, by = "value") %>%
  dplyr::pull(color)

# Open a pdf file
plot(graph_ntw,
     # Size of the edge
     edge.width = edge_importance,
     edge.color = color_edge,
     # Size of the buble
     vertex.size = deg,
     vertex.color = "#cde394",
     vertex.frame.color = "white",
     vertex.label.color = "black",
     vertex.label.family = "Ubuntu", # Font family of the label (e.g.“Times”, “Helvetica”)
     layout = layout.circle)

image.png

Lastly one can compute cell-cell correlations to see groups of cells that correlate positively or negatively.

# Remove cell types not predicted to be on the tissue
decon_mtrx_sub <- decon_mtrx[, cell_types_all]
decon_mtrx_sub <- decon_mtrx_sub[, colSums(decon_mtrx_sub) > 0]

# Compute correlation
decon_cor <- cor(decon_mtrx_sub)

# Compute correlation P-value
p.mat <- corrplot::cor.mtest(mat = decon_mtrx_sub, conf.level = 0.95)

# Visualize
ggcorrplot::ggcorrplot(
  corr = decon_cor,
  p.mat = p.mat[[1]],
  hc.order = TRUE,
  type = "full",
  insig = "blank",
  lab = TRUE,
  outline.col = "lightgrey",
  method = "square",
  # colors = c("#4477AA", "white", "#BB4444"))
  colors = c("#6D9EC1", "white", "#E46726"),
  title = "Predicted cell-cell proportion correlation",
  legend.title = "Correlation\n(Pearson)") +
  ggplot2::theme(
    plot.title = ggplot2::element_text(size = 22, hjust = 0.5, face = "bold"),
    legend.text = ggplot2::element_text(size = 12),
    legend.title = ggplot2::element_text(size = 15),
    axis.text.x = ggplot2::element_text(angle = 90),
    axis.text = ggplot2::element_text(size = 18, vjust = 0.5))

image.png

0.8 Step-by-Step insight

Here we are going to show step by step what is going on and all the different steps involved in the process.

image.png

还是那句话，生活很好，有你更好

最后编辑于：2021.09.03 17:43:32

禁止转载，如需转载请通过简信或评论联系作者。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 194,524评论 5赞 460
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 81,869评论 2赞 371
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 141,813评论 0赞 320
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 52,210评论 1赞 263
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 61,085评论 4赞 355
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 46,117评论 1赞 272
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 36,533评论 3赞 381
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 35,219评论 0赞 253
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 39,487评论 1赞 290
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 34,582评论 2赞 309
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 36,362评论 1赞 326
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,218评论 3赞 312
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 37,589评论 3赞 299
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 28,899评论 0赞 17
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,176评论 1赞 250
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 41,503评论 2赞 341
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 40,707评论 2赞 335

10X单细胞空间联合分析之三----Spotlight

软件联合分析的原理

软件的优势

技术示意图

接下来就是一些运用了，包括文献本身的验证和空间数据上的运用，大家看看即可， 肯定效果很不错，要不然发不出来，。

最后我们来看一下代码部分：

代码部分

0.1 Installation

0.2 Libraries

0.3 Load data

0.4 Pre-processing

0.5 Descriptive

0.6 Compute marker genes

0.6.1 SPOTlight Decomposition

0.6.2 Assess deconvolution

0.7 Visualization

0.7.1 Specific cell-types

0.7.2 Spatial scatterpies

0.7.3 Spatial interaction graph

0.8 Step-by-Step insight

推荐阅读更多精彩内容

接下来就是一些运用了，包括文献本身的验证和空间数据上的运用，大家看看即可，肯定效果很不错，要不然发不出来，。