Seurat v4 与 v5 数据整合工作流的技术差异说明

批次效应 是指在实验中由于不同的实验条件、操作人员、试剂批次、设备差异等非生物学因素引起的系统性差异。具体来说，在单细胞RNA测序（scRNA-seq）中，不同的实验批次可能会产生显著的技术变异，这些变异会掩盖实际的生物学信号，导致错误的分析结果。直观来看，批次效应表现为，在整体数据集进行UMAP 和/或 tSNE 降维分析中，不同供体的数据不能 “很好地混合”，未经校正的批次效应会导致不同批次的样本聚集在一起，而不是根据细胞类型或状态进行聚类。如果在此基础上进行更深的数据挖掘，就很可能会获得虚假差异（例如来自单一供体的细胞群）。单细胞测序数据集的 整合分析（例如跨实验批次、供体或条件） 通常是 scRNA-seq 工作流程中的关键步骤。通过综合分析，可以匹配不同数据集中的共享细胞类型和状态，这不仅提高了统计能力，更重要的是，促进了跨数据集的准确比较分析。在单细胞分析的各种工作流中，Seurat 分析套件是目前应用最广泛的方法之一。随着近年来 Seurat 团队的不断更新，该分析框架已经发展到了 v5 版本。相较于之前的版本，Seurat v5 在数据结构和分析方法上都进行了许多重要的调整和改进。本帖收集了一些资料特别对新旧版本的数据整合工作流之间的一些差异进行了汇总。

Seurat 标准整合工作流：Analysis, visualization, and integration of Visium HD spatial datasets with Seurat • Seurat (satijalab.org)

1、Seurat v4 Integration Work-flow

在 Seurat v4 整合工作流程中，每个整合步骤中发生的情况非常清楚，如每个命令的文档页面底部所记录。

1) SelectIntegrationFeatures()
Summary:
Choose the features to use when integrating multiple datasets. This function ranks features by the number of datasets they are deemed variable in, breaking ties by the median variable feature rank across datasets. It returns the top scoring features by this ranking.
2) FindIntegrationAnchors()
Summarized as this:
Perform dimensional reduction on the dataset pair as specified via the reduction parameter. If l2.norm is set to TRUE, perform L2 normalization of the embedding vectors.

Identify anchors - pairs of cells from each dataset that are contained within each other's neighborhoods (also known as mutual nearest neighbors).

Filter low confidence anchors to ensure anchors in the low dimension space are in broad agreement with the high dimensional measurements. This is done by looking at the neighbors of each query cell in the reference dataset using max.features to define this space. If the reference cell isn't found within the first k.filter neighbors, remove the anchor.

Assign each remaining anchor a score. For each anchor cell, determine the nearest k.score anchors within its own dataset and within its pair's dataset. Based on these neighborhoods, construct an overall neighbor graph and then compute the shared neighbor overlap between anchor and query cells (analogous to an SNN graph). We use the 0.01 and 0.90 quantiles on these scores to dampen outlier effects and rescale to range between 0-1

3)IntegrateData()
Summary:
For pairwise integration:

Construct a weights matrix that defines the association between each query cell and each anchor. These weights are computed as 1 - the distance between the query cell and the anchor divided by the distance of the query cell to the k.weightth anchor multiplied by the anchor score computed in FindIntegrationAnchors. We then apply a Gaussian kernel width a bandwidth defined by sd.weight and normalize across all k.weight anchors.

Compute the anchor integration matrix as the difference between the two expression matrices for every pair of anchor cells

Compute the transformation matrix as the product of the integration matrix and the weights matrix.

Subtract the transformation matrix from the original expression matrix.

It is clear then that the output of this integration workflow is a corrected matrix used for downstream analysis.

1.1 Seurat v4 Log-Normalization Based Integration

{
    pacman::p_load(Seurat,SeuratData,dplyr)
   options(future.globals.maxSize = 1e+12)

    obt.list <- SplitObject(ifnb, split.by = "stim")
    features <- SelectIntegrationFeatures(object.list = obt.list)
    set.anchors <- FindIntegrationAnchors(object.list = obt.list, anchor.features = features)
    set.combined <- IntegrateData(anchorset = set.anchors)
    
    DefaultAssay(set.combined) <- 'integrated'
    set.combined <- set.combined %>% RunUMAP(dims = 1:30,min.dist = 1e-5) 
}

1.2 Seurat v4 SCT-Normalization Based Integration

{
    pacman::p_load(Seurat,SeuratData,dplyr)
   options(future.globals.maxSize = 1e+12)

    ifnb <- UpdateSeuratObject(LoadData("ifnb"))

    seuObj.list <- SplitObject(ifnb, split.by = "stim")
    lapply(seuObj.list, function(obj){
        obj %>% SCTransform(verbose = FALSE) %>%
            RunPCA(verbose = FALSE)
    })
    
    features <- SelectIntegrationFeatures(object.list = seuObj.list, nfeatures = 3000)
    seuObj.list <- PrepSCTIntegration(object.list = seuObj.list, anchor.features = features)
    ### default reduction is CCA
    anchors <- FindIntegrationAnchors(object.list = seuObj.list, normalization.method = "SCT",anchor.features = features)
    seuObj.combined.sct <- IntegrateData(anchorset = anchors, normalization.method = "SCT")
    seuObj.combined.sct <- seuObj.combined.sct %>% 
        RunPCA( verbose = T) %>% 
        RunUMAP(dims = 1:30,min.dist = 1e-3, verbose = F) 
}

2、Seurat v5 Integration Work-flow

在 Seurat v5 整合工作流程中，相比起 v4，开发团队引入了 Layer 数据结构，并且工作流程从代码/可用性角度进行了简化，但 v5总体步骤与v4工作流程相同。该流程的主要区别在于，在 Seurat v5 中，校正是在整个对象的全部细胞的 low-dimensional space 上进行的（默认为 PCA 空间），而不是将基因表达值本身作为校正的输入，并且不再返回 integrated assay，而是直接返回一个校正的低维空间 (默认为 integrated.dr)。值得注意的是，在生成整个对象的PCA空间的时候，Seurat v5 更新了 FindVariableFeatures 的方法，现流程在一个包含 2+ Layers 的数据对象中查找 可变特征 时，会分别识别每一层的可变特征，然后找出共同的可变特征。接着，使用每个 Layer 中最可变且在其他矩阵中也存在的特征，补充进来直到达到所需的 n 个可变特征（这与Seurat v4中用于整合数据时识别特征的方法 SelectIntegrationFeatures 相同），从而确保整合后的数据能够准确反映细胞之间的生物学差异，而不是技术差异。

此外，如果研究确实依赖 integrated assay，Seurat v5 仍然支持旧的整合工作流程（使用 IntegrateData 而不是 IntegrateLayers）。

见讨论：

2.1 Seurat v5 Log-Normalization Based Integration

{
    pacman::p_load(Seurat,SeuratData,dplyr)
    options(future.globals.maxSize = 1e+12)

    ifnb <- UpdateSeuratObject(LoadData("ifnb"))
    ifnb[["RNA"]] <- split(ifnb[["RNA"]], f = ifnb$stim)
    ifnb <- ifnb %>% NormalizeData(verbose = F) %>% 
        FindVariableFeatures(verbose = F) %>% 
        ScaleData(verbose = F) %>% 
        RunPCA(verbose = F) %>% 
        IntegrateLayers(method = CCAIntegration,verbose = FALSE) %>% 
        RunUMAP(dims = 1:30,reduction = "integrated.dr",min.dist = 1e-3, verbose = F)
    
    # re-join layers after integration
    ifnb[["RNA"]] <- JoinLayers(ifnb[["RNA"]])
}

2.2 Seurat v5 SCT-Normalization Based Integration

pacman::p_load(Seurat,SeuratData,dplyr,ggplot2)
options(future.globals.maxSize = 1e+12)

{
    ifnb[["RNA"]] <- split(ifnb[["RNA"]], f = ifnb$stim)
    ifnb <- ifnb %>% SCTransform(verbose = F) %>% 
        RunPCA(verbose = F) %>% 
        IntegrateLayers(method = CCAIntegration, normalization.method = "SCT", verbose = F) %>% 
        RunUMAP(dims = 1:30,reduction = "integrated.dr",min.dist = 1e-3, verbose = F)
}

3、Compare Result of Seurat v4 & v5 SCT-Normalization Based Integration

通过对CCA方法校正后的UMAP表示进行目视检查，Seurat v4和v5 版本都能比较好的校正不同技术平台引入的技术偏差，并且保留了细胞类型的可区分性。

DimPlot(seuObj.combined.sct,group.by = "seurat_annotations",label = T) & labs(title = "V4 Integ") | DimPlot(ifnb,group.by = "seurat_annotations",label = T) & labs(title = "V5 Integ")

Reference

Seurat -- SCTransform_> e14 <- sctransform(object = e14) running sctrans
https://satijalab.org/seurat/archive/v4.3/integration_introduction
Unclear what IntegrateLayers (v5) is actually doing vs IntegrateData (v4) · Issue #8653 · satijalab/seurat (github.com)

最后编辑于：2024.07.24 17:01:51

禁止转载，如需转载请通过简信或评论联系作者。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,547评论 6赞 477
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,399评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,428评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,599评论 1赞 274
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,612评论 5赞 365
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,577评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,941评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,603评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,852评论 1赞 297
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,605评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,693评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,375评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,955评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,936评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,172评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 43,970评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,414评论 2赞 342