10X单细胞通讯分析之ICELLNET

今天我们来分享一下新的10X单细胞通讯分析的方法,名字是ICELLNET,文章在Dissection of intercellular communication using the transcriptome-based framework ICELLNET,2021年2月份发表于NC,影响因子12分,我们先分享文章,最后来看参考代码。

abstract

这部分首先介绍了细胞通讯存在的挑战
1、global integration of cell-to-cell communication
2、biological interpretation
3、application to individual cell population transcriptomic profiles、
这几部分是细胞通讯问题的共识,当然,总是会有解决办法的。
其次介绍了软件ICELLNET的优势
1、an original expert-curated database of ligand–receptor interactions accounting for multiple subunits expression(配受体数据库,多亚基的情况,跟cellphoneDB相似)。
2、quantification of communication scores(通讯强度,这个大部分软件都会有)
3、the possibility to connect a cell population of interest with 31 reference human cell types;(这一点不是很理解,我们往下看看)。
4、three visualization modes to facilitate biological interpretation.(可视化)
最后自夸一下ICELLNET is a global, versatile, biologically validated, and easy-to-use framework to dissect cell communication from individual or multiple cell-based transcriptomic profiles(熟悉的味道,😄)

introduction

我们来看一下这部分的重点
Most studies in the past decades have focused on a limited number of communication molecules in a given anatomical site or physiological process.The availability of large-scale transcriptomic datasets from several cell types, tissue locations, and cell activation states, opened the possibility of reconstructing cell-to-cell interactions based on the expression of specific ligand–receptor pairs on sender and target cells, respectively.(这也是目前我们通常的做法)。Many of them exploit single-cell RNA-seq datasets to infer communication between groups of cells within the same dataset(这一点已经改进了,配受体数据库已经扩充了很多)。Despite leading to interesting and often innovative hypotheses,these methods do not integrate putative signals that may come from more distant cells.(这个地方介绍细胞通讯跟距离有关,当然有关系,分泌型的配体有半衰期,但作者怎么分析,我们往下看看)。Also, they cannot be applied to bulk transcriptomic data derived from a given cell population(这个当然是)。Such datasets are numerous in public databases, and can be a source of novel insights into how each cell type may send or receive communication signals.
Another important aspect when inferring cell-to-cell communication is the use of databases of ligand–receptor interactions(配受体库的选择). Some are very broad with over 2000 ligand–receptor pairs(例如NATMI),but lack systematic manual or expert curation(这个有的数据库很多配受体都没有验证过)。which may impact the quality and biological relevance of the annotation.Others include lower numbers of ligand–receptor pairs and provide manually curated information from the literature,without necessarily providing systematic combinatorial rules for the association of protein subunits into multimeric ligands or receptors(复合物的考虑,这个当然首推celltalker)。
The last point relates to the granularity that is structuring the biological information into families and subfamilies of functionally and structurally related molecules.(配受体的家族划分确实很多软件是不做的,我们只得到了配受体对,却没有大类划分) 。
In this study we develop ICELLNET, a versatile computational framework to infer cell-to-cell communication from a wide range of bulk and single-cell transcriptomic datasets. Each family of communication molecules is expert curated and organized into biologically relevant sub-families.(这个软件的优势,看来还是有进步的)。ICELLNET offers an array of visualization tools in order to facilitate biological interpretation and discoveries.(没有可视化的软件都是耍流氓)。

Result

第一部分Expert-curated database of ligand–receptor interactions

第一部分首先介绍数据库
数据库整合了literature and public databases(可以理解为数据库的扩充)。
数据库的特点:
1、robustness of the findings(稳定性)
2、consistency with international classifications and nomenclature(分类)
3、experimental validation of the functionality of the ligand–receptor interaction。(实验验证)。
4、We also used consensus reviews from leaders in the field(权威认证)
5、We did not include putative interactions based on protein–protein interaction predictions, as it is done in some other databases(推定的配受体要排除)
这几部分倒是很全面。
This led to the integration of 380 ligand–receptor interactions into the ICELLNET database(这个数据库是真的少)。Whenever relevant, we took into account the
multiple subunits of the ligands and the receptors。


图片.png

Interactions were classified into 6 major families of communication molecules, with a strong emphasis on inflammatory and immune processes: Growth factors, Cytokines, Chemokines, Immune Checkpoints, Notch signaling, and Antigen binding(分类倒是很不错)。


图片.png

Cytokine–receptor pairs were mapped in an exhaustive manner, by exploiting a series of reference articles and consensus classifications. They represent 50% of the total interactions included in the database (194 interactions), and were further classified into 7 sub-families according to structural protein motifs: type 1 cytokines, type 2 cytokines, IL-1 family, IL-17 family, TNF family, TGF-ß family, and RTK cytokines(对细胞因子进行了详细的划分)。
This database is integrating information on both multiple subunits of ligands and receptors, and a classification into molecular families/subfamilies.(数据库的优势还是很明显的,就是有点少)。
第二部分结果Development of a computational pipeline to dissect intercellular

communication.(新的算法)
In the ICELLNET framework, we developed an automatized tool in R script to infer communication between multiple cell types by integrating“
(1)prior knowledge on ligand–receptor interactions
(2)computation of a communication score between pairs of cells based on their
transcriptomic profiles(这个很大众)
(3)several visualization modes to guide results interpretation.(可视化,也很大众)
Quantification of intercellular communication was achieved by scoring the intensity of each ligand–receptor interaction between two cell types from their expression profiles


图片.png

这里我们需要重点注意配受体分数的算法,我们方法中分析。还有就是自己的样本会与数据库的基础分析结果进行比较。
From each transcriptomic profile, all genes or only differentially expressed genes could be used,(这个地方要注意,最好还是差异基因)and no filtering threshold
was applied to gene expression.the genes coding for ligands/receptors were
selected from all 380 interactions to compute the score, but it is also possible to restrict the database to specific families of molecules, depending on the biological question.(大众的做法)。
A unique feature and strength of ICELLNET is its ability to infer cell-to-cell communication even from an individual cell population-based transcriptome of interest(这不算优势了,很多软件可以做了)。ICELLNET separately considers other cell types with known transcriptomic profiles (hereafter called « partner cells ») that can connect to the central cell. These can be cell types coming from the same dataset as the central cell, or from any other transcriptomic dataset.(这个地方值得注意,对其他转录组数据和自己的目标细胞配受体推断,其中提到了Human Primary Cell Atlas)。This public dataset includes transcriptomic profiles of 31 human cell types including immune cells, stromal cells, neural cells, and tissuespecific cell types, all generated with the same Affymetrix technology(这就是abstract里面疑问的解答)。

第三部分结果Establishment of a score to assess the communication between

cells.
Since cell-to-cell communication is directional, we considered ligand expression from the central cell, and receptor expression from the partner cells in order to assess outward communication. Conversely, we then selected receptor expression from the central cell, and ligand expression from partner cells in order to assess inward communication。(很简单),接下来看一个炸弹,For each gene, expression levels were scaled by maximum of gene expression in the dataset, in order to avoid a communication score predominantly driven by highly expressed genes. Indeed, bioactivity of communication molecules varies a lot. Some cytokines, such as IL-12 and IL-4, are very bioactive at low concentrations, and often expressed at very low levels both in transcript and protein. Conversely, many chemokines are produced at much higher levels, without necessarily having a higher bioactivity. Not scaling the data before inferring a communication score would systematically favor a few highly expressed molecules, and would not allow detecting the contribution of important molecules expressed at much lower levels.(这个地方值得深思,我相信大家做细胞通讯都是使用均一化的数据,使用scale的数据非常少见,但作者给出的理由也很充分)。

分析结果第四部分ICELLNET offers different graphical representations allowing multiple layers of interpretation.(可视化)
图片.png

至于结果的后面都是一些应用案例了,不用看都知道效果不错,不然肯定发不出来。

Method,我们关注重点

Gene expression matrix scaling method. After selecting the genes corresponding to the ligands and/or receptors from the transcriptional profiles, each ligand/receptor gene expression is scaled by maximum of gene expression among all the conditions and then multiplied by 10, to have values ranging from 0 to 10(这个地方要不要借鉴值得一试). For each gene, the maximum value (10) is defined as the mean of expression of the 5% highest values of expression for RNA-seq and microarray datasets. Outliers are rescaled at 10 if above maximum value.
接下来是难点,score的计算Intercellular communication score computation.
To score the intensity of a particular ligand–receptor interaction between a central cell and a given partner cell, we considered the product of the expression of the ligand in the central cell and of the cognate receptor in the partner cells.Formally, if lij is the average expression level of ligand i by the central cell in the experimental condition j, and rik
is the average expression of the corresponding receptor by cell type k, the intensity sij,k of the corresponding interaction was quantified by sij,k = rik * lij (还是平均值相乘)。For interactions requiring
multiple components of the ligand and/or of the receptor, we considered a geometric
average of the receptor components.(对于复合物的计算采用复合物表达的平均值,这个相对于cellphoneDB的方法有差异)。To assign a global score Sj,k to the communication between the central cell in the condition j and cell type k, a composite score was defined by summing up the intensity of all the possible ligand–receptor interactions,

图片.png

图片.png

当然还有一些其他注意的地方,这里我们就不多说了。

最后,我们来看看实例代码

首先加载模块

library(BiocGenerics)
library("org.Hs.eg.db")
library("hgu133plus2.db")
library(jetset)
library(ggplot2)
library(dplyr)
library(icellnet)
library(gridExtra)
library(Seurat)

先来看一看配受体库

db=as.data.frame(read.csv(curl::curl(url="https://raw.githubusercontent.com/soumelis-lab/ICELLNET/master/data/ICELLNETdb.tsv"), sep="\t",header = T, check.names=FALSE, stringsAsFactors = FALSE, na.strings = ""))
图片.png

确实进行了详细的划分,值得学习。
1 - Load Seurat object

#Load data
seurat <- readRDS(file = "Lupus_Seurat_SingleCell_Landscape.Rds")
seurat <- NormalizeData(seurat)
seurat <- ScaleData(seurat)

#only for UMAP visualization, not for ICELLNET purpose
seurat <- FindVariableFeatures(seurat, selection.method = "vst", nfeatures = 2000)
seurat <- RunPCA(seurat)
seurat <- RunUMAP(seurat, dims = 1:50)
DimPlot(seurat, reduction = 'umap', group.by = 'author_annotation', label = T)

如果我们之前做过Seurat分析了,这一步不需要再进行这么多的处理了。

2 - Retrieve gene expression matrix

a - Compute manually average gene expression per cluster without filtering

# Taking into account the total nb of cells in each cluster
filter.perc=0
average.clean= sc.data.cleaning(object = seurat, db = db, filter.perc = filter.perc, save_file = T, path="path/", force.file = F)

b - Compute manually average gene expression per cluster with filtering for gene expression by a defined cell percentage at a cluster level

ICELLNET offers the possibility to filter the initial gene expression matrix to keep genes at least expressed by defined percentage of cell in their respective cluster (below 2%):

filter.perc=2
average.clean= sc.data.cleaning(object = seurat, db = db2, filter.perc = filter.perc, save_file = T, path="path/", force.file = F)

This filtering allows to remove all the genes that are expressed by a very low number of cells in some clusters, to avoid false negative cell-cell interactions scores. If you are applying ICELLNET for the first time on your dataset, we advice to apply first ICELLNET without filtering, and then with filtering at 2% to see the differences and filtered genes. This will help a lot in the analysis and for biological interpretation of the data(挑选配受体)。

3 - Apply icellnet pipeline on cluster of interest

In this example, we investigate cDC to T cell communication from CM3 cluster (= conventional dendritic cells, 82 cells), to either CT3b or CT0a clusters (CT3b=TFH-like cells, 50 cells ; CT0a = effector memory CD4+ T cells, 220 cells).

Format CC.data and PC.data and PC.target


data.icell=as.data.frame(gene.scaling(as.data.frame(average.clean), n=1, db=db))

PC.data=as.data.frame(data.icell[,c("CT3b","CT0a", "Symbol")], row.names = rownames(data.icell))

PC.target=data.frame("Class"=c("CT3b","CT0a"), "ID"= c("CT3b","CT0a"), "Cell_type"=c("CT3b","CT0a"))
rownames(PC.target)=c("CT3b","CT0a")

my.selection=c("CT3b","CT0a")

Compute intercellular communication scores

We investigate conventional dendritic cells (cDCs, CM3 cluster) to T cell (either CT3b or CT0a clusters) outward communication, so this means that we consider ligands expressed by cDCs and receptors expressed by T cells to compute intercellular communication scores. Outward communication -> direction = "out"

score.computation.1= icellnet.score(direction="out", PC.data=PC.data, 
                                    CC.data= as.data.frame(data.icell[,c("CM3")], row.names = rownames(data.icell)),  
                                    PC.target = PC.target, PC=my.selection, CC.type = "RNAseq", 
                                    PC.type = "RNAseq",  db = db)
score1=as.data.frame(score.computation.1[[1]])
lr1=score.computation.1[[2]]

Visualisation of contribution of family of molecules to communication scores

# label and color label if you are working families of molecules already present in the database
my.family=c("Growth factor","Chemokine","Checkpoint","Cytokine","Notch family","Antigen binding")
family.col = c( "Growth factor"= "#AECBE3", "Chemokine"= "#66ABDF", "Checkpoint"= "#1D1D18"  ,
            "Cytokine"="#156399", "Notch family" ="#676766", "Antigen binding" = "#12A039",  "other" = "#908F90",  "NA"="#908F90")

ymax=round(max(score1))+1 #to define the y axis range of the barplot

LR.family.score(lr=lr1, my.family=my.family, db.couple=db.name.couple, plot=F) # table of contribution of each family of molecule to the scores

LR.family.score(lr=lr1, my.family=my.family, db.couple=db.name.couple, plot=T, title="DC-T comm", family.col=family.col) #display barplot

图片.png

Visualisation of highest and most different interactions between the two conditions (selection of topn=30 interactions):

30 first most contributing interactions (sort.by="sum")

colnames(lr1)=c("CM3_to_CT3b", "CM3_to_CT0a")
LR.balloon.plot(lr = lr1, thresh = 0 , topn=30 , sort.by="sum",  db.name.couple=db.name.couple, family.col=family.col, title="Most contributing interactions")

图片.png

30 first most different interactions between the conditions (sort.by="var")

colnames(lr1)=c("CM3_to_CT3b", "CM3_to_CT0a")
LR.balloon.plot(lr = lr1, thresh = 0 , topn=30 , sort.by="var",  db.name.couple=db.name.couple, family.col=family.col, title="Most contributing interactions")

图片.png

Remarks on biological interpretation:
ICELLNET will always set, for each gene, maximum gene expression value at 10. Then, the maximum score that you can obtain for an individual interaction is 100 (10 for the ligand, 10 for the receptor).(注意分析得到的配受体强度)

This means that high interaction scores does not mean high expression. You should come back to the initial SeuratObject to look at individual gene expression, and that the ligand/receptor of interest if effectively expressed by the cluster.(得到的结果需要手动检验)

Filtering of genes expressed by each cluster according to cell percentage expressing the gene (= with counts >0) for each cluster can be an option to remove false-negative interactions scores. This can be done with the sc.data.clean function, by setting filter.perc to a defined value (2 for 2%, 5 for 5% etc...). Filtered genes (expressed by a number of cells among the cluster below the percentage) will be set to 0.(挑选基因)

最后圈图的画法

network.create(score1,PC.col = color)
图片.png

最后呢,问各位道友一句做细胞通讯是不是要对输入的矩阵做Scale呢?,不知道大家自己的答案是怎么样的

生活很好,有你更好

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 203,456评论 5 477
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 85,370评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 150,337评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,583评论 1 273
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,596评论 5 365
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,572评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 37,936评论 3 395
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,595评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 40,850评论 1 297
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,601评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,685评论 1 329
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,371评论 4 318
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 38,951评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,934评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,167评论 1 259
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 43,636评论 2 349
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,411评论 2 342

推荐阅读更多精彩内容