Seurat的打分函数AddMouduleScore

在读张泽民老师发表的新冠文献的时候看到计算免疫细胞的cytokine score或inflammatory score使用了Seurat包的AddMouduleScore函数,在计算细胞周期的CellCycleScoring函数的原代码中也使用了这个函数。

此功能可用于计算任何基因列表的监督模块评分,非常实用!!!

1. 查看一下这个函数的用法

library(Seurat)
?AddModuleScore

2. 使用

输入数据:Seurat对象和一个gene list。

  • 1 准备矩阵
library(tidyverse)
library(Matrix)
library(cowplot)
pbmc <- readRDS("pbmc.rds") 
  • 2 准备想要计算的gene list
inflammatory_gene <- readxl::read_xlsx("inflammatory_gene.xlsx")
View(inflammatory_gene)
#转换成list
gene <- as.list(inflammatory_gene)
  • 3 计算
pbmc <- AddModuleScore(
    object = pbmc,
    features = gene,
    ctrl = 100, #默认值是100
    name = 'CD_Features'
)

计算结果保存在pbmc@meta.data[["CD_Features1"]]
得到的score是在每个细胞中算出来的我们感兴趣的基因的表达均值
其在使用时需要先计算基因集中所有基因的平均值,再根据平均值把表达矩阵切割成若干个bin,然后从切割后的每一个bin随机抽取对照基因(基因集外的基因,默认100个)作为背景值。最后所有的目标基因算一个平均值,所有的背景基因算一个平均值,两者相减就是该gene set的score值。因此,在整合不同样本的情况下,即使使用相同基因集为相同细胞打分,也会产生不同的富集评分。从本质上看它和Zscore的算法很类似,Zscore又称Z值,原是一个统计学概念,表示的是个体测量值X以标准差σ为单位,偏离总体均数μ的距离,即:Z score=(X-μ)/σ。牵扯到统计学的概念不免有些难以理解,简单说它就是处理过的平均值。

colnames(pbmc@meta.data)
# [1] "orig.ident"      "nCount_RNA"      "nFeature_RNA"   
# [4] "percent.mt"      "RNA_snn_res.0.5" "seurat_clusters"
# [7] "cell_type"       "CD_Features1" 
colnames(pbmc@meta.data)[8] <- 'Inflammatory_Score' 
VlnPlot(pbmc,features = 'Inflammatory_Score')

绘制箱线图

data<- FetchData(pbmc,vars = c("cell_type","Inflammatory_Score"))
p <- ggplot(data,aes(cell_type,Inflammatory_Score))
p+geom_boxplot()+theme_bw()+RotatedAxis()

绘制分数的分布图

library(ggplot2)
mydata<- FetchData(pbmc,vars = c("UMAP_1","UMAP_2","Inflammatory_Score"))
a <- ggplot(mydata,aes(x = UMAP_1,y =UMAP_2,colour = Inflammatory_Score))+geom_point(size = 1)+scale_color_gradientn(values = seq(0,1,0.2),colours = c('blue','cyan','green','yellow','orange','red'))

a+ theme_bw() + theme(panel.border = element_blank(), panel.grid.major = element_blank(),
                      panel.grid.minor = element_blank(), axis.line = element_line(colour = "black"))

打分是一个连续变化的值,我们也可以直接将其转换为一定的分类变量:

#按照均值分
pbmc[["stage"]] <- ifelse(pbmc@meta.data[,"Inflammatory_Score"] > mean(pbmc@meta.data[,"Inflammatory_Score"]),"High","Low")
rownames(pbmc@meta.data)
#按照75%分
pbmc[["stage"]] <-pbmc@meta.data[,"Inflammatory_Score"] <- ifelse(pbmc@meta.data[,"Inflammatory_Score"] > quantile(pbmc@meta.data[,"Inflammatory_Score"],0.75),"High","Low")
#按照具体数值分
pbmc[["stage"]] <-pbmc@meta.data[,"Inflammatory_Score"] <- ifelse(pbmc@meta.data[,"Inflammatory_Score"] > 0.5,"High","Low")
Idents(pbmc) <- 'Inflammatory_Score'
DimPlot(pbmc,reduction = "umap",label = TRUE,pt.size = 1.5)

3. 主要参数

参数 意义
object Seurat object
features A list of vectors of features for expression programs; each entry should be a vector of feature names
pool List of features to check expression levels against, defaults to rownames(x = object)
nbin Number of bins of aggregate expression levels for all analyzed features
ctrl Number of control features selected from the same bin per analyzed feature
k Use feature clusters returned from DoKMeans
assay Name of assay to use
name Name for the expression programs; will append a number to the end for each entry in features (eg. if features has three programs, the results will be stored as name1, name2, name3, respectively)
seed Set a random seed. If NULL, seed is not set.
search Search for symbol synonyms for features in features that don't match features in object? Searches the HGNC's gene names database; see UpdateSymbolList for more details
... Extra parameters passed to UpdateSymbolList

4. 函数代码

AddModuleScore
function (object, features, pool = NULL, nbin = 24, ctrl = 100, 
    k = FALSE, assay = NULL, name = "Cluster", seed = 1, search = FALSE, 
    ...) 
{
    if (!is.null(x = seed)) {
        set.seed(seed = seed)
    }
    assay.old <- DefaultAssay(object = object)
    assay <- assay %||% assay.old
    DefaultAssay(object = object) <- assay
    assay.data <- GetAssayData(object = object)
    features.old <- features
    if (k) {
        .NotYetUsed(arg = "k")
        features <- list()
        for (i in as.numeric(x = names(x = table(object@kmeans.obj[[1]]$cluster)))) {
            features[[i]] <- names(x = which(x = object@kmeans.obj[[1]]$cluster == 
                i))
        }
        cluster.length <- length(x = features)
    }
    else {
        if (is.null(x = features)) {
            stop("Missing input feature list")
        }
        features <- lapply(X = features, FUN = function(x) {
            missing.features <- setdiff(x = x, y = rownames(x = object))
            if (length(x = missing.features) > 0) {
                warning("The following features are not present in the object: ", 
                  paste(missing.features, collapse = ", "), ifelse(test = search, 
                    yes = ", attempting to find updated synonyms", 
                    no = ", not searching for symbol synonyms"), 
                  call. = FALSE, immediate. = TRUE)
                if (search) {
                  tryCatch(expr = {
                    updated.features <- UpdateSymbolList(symbols = missing.features, 
                      ...)
                    names(x = updated.features) <- missing.features
                    for (miss in names(x = updated.features)) {
                      index <- which(x == miss)
                      x[index] <- updated.features[miss]
                    }
                  }, error = function(...) {
                    warning("Could not reach HGNC's gene names database", 
                      call. = FALSE, immediate. = TRUE)
                  })
                  missing.features <- setdiff(x = x, y = rownames(x = object))
                  if (length(x = missing.features) > 0) {
                    warning("The following features are still not present in the object: ", 
                      paste(missing.features, collapse = ", "), 
                      call. = FALSE, immediate. = TRUE)
                  }
                }
            }
            return(intersect(x = x, y = rownames(x = object)))
        })
        cluster.length <- length(x = features)
    }
    if (!all(LengthCheck(values = features))) {
        warning(paste("Could not find enough features in the object from the following feature lists:", 
            paste(names(x = which(x = !LengthCheck(values = features)))), 
            "Attempting to match case..."))
        features <- lapply(X = features.old, FUN = CaseMatch, 
            match = rownames(x = object))
    }
    if (!all(LengthCheck(values = features))) {
        stop(paste("The following feature lists do not have enough features present in the object:", 
            paste(names(x = which(x = !LengthCheck(values = features)))), 
            "exiting..."))
    }
    pool <- pool %||% rownames(x = object)
    data.avg <- Matrix::rowMeans(x = assay.data[pool, ])
    data.avg <- data.avg[order(data.avg)]
    data.cut <- cut_number(x = data.avg + rnorm(n = length(data.avg))/1e+30, 
        n = nbin, labels = FALSE, right = FALSE)
    names(x = data.cut) <- names(x = data.avg)
    ctrl.use <- vector(mode = "list", length = cluster.length)
    for (i in 1:cluster.length) {
        features.use <- features[[i]]
        for (j in 1:length(x = features.use)) {
            ctrl.use[[i]] <- c(ctrl.use[[i]], names(x = sample(x = data.cut[which(x = data.cut == 
                data.cut[features.use[j]])], size = ctrl, replace = FALSE)))
        }
    }
    ctrl.use <- lapply(X = ctrl.use, FUN = unique)
    ctrl.scores <- matrix(data = numeric(length = 1L), nrow = length(x = ctrl.use), 
        ncol = ncol(x = object))
    for (i in 1:length(ctrl.use)) {
        features.use <- ctrl.use[[i]]
        ctrl.scores[i, ] <- Matrix::colMeans(x = assay.data[features.use, 
            ])
    }
    features.scores <- matrix(data = numeric(length = 1L), nrow = cluster.length, 
        ncol = ncol(x = object))
    for (i in 1:cluster.length) {
        features.use <- features[[i]]
        data.use <- assay.data[features.use, , drop = FALSE]
        features.scores[i, ] <- Matrix::colMeans(x = data.use)
    }
    features.scores.use <- features.scores - ctrl.scores
    rownames(x = features.scores.use) <- paste0(name, 1:cluster.length)
    features.scores.use <- as.data.frame(x = t(x = features.scores.use))
    rownames(x = features.scores.use) <- colnames(x = object)
    object[[colnames(x = features.scores.use)]] <- features.scores.use
    CheckGC()
    DefaultAssay(object = object) <- assay.old
    return(object)
}
<bytecode: 0x7fc8df13f008>
<environment: namespace:Seurat>

5. 参考文献:

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 194,491评论 5 459
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 81,856评论 2 371
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 141,745评论 0 319
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 52,196评论 1 263
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 61,073评论 4 355
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 46,112评论 1 272
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 36,531评论 3 381
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 35,215评论 0 253
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 39,485评论 1 290
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 34,578评论 2 309
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 36,356评论 1 326
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,215评论 3 312
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 37,583评论 3 299
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 28,898评论 0 17
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,174评论 1 250
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 41,497评论 2 341
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 40,697评论 2 335

推荐阅读更多精彩内容