肺是人类的主要呼吸器官
,其中近端气道
和远端肺泡
分别负责空气传导和气体交换。然而,在人类肺发育的胚胎阶段,近端-远端模式的调节在很大程度上是未知的。在这里,我们使用单细胞 RNA 测序研究了受精后第 4-8 周(卡内基第 12-21 期)人类胚胎的早期肺发育,并获得了169,686 个细胞的转录组图谱
。我们在第 4 周观察到肺器官发生开始时近端和远端上皮细胞的可识别基因表达模式。此外,我们确定了近端 (例如 THRB 和 EGR3)
和远端 (例如 ETV1 和 SOX6)
上皮模式化的新型转录调节因子。进一步解剖揭示了各种基质细胞群,包括早期胚胎 BDNF+群体,提供了具有空间特异性的近端-远端模式生态位。此外,我们阐明了气道和血管平滑肌祖细胞在肺发育早期的细胞命运分叉和成熟。总之,我们的研究扩大了早期胚胎阶段人类肺发育生物学的范围。内在转录调节因子和新的生态位提供者的发现加深了对人类肺发育中上皮近端-远端模式的理解,为再生医学开辟了新的途径。
背景:
“内消呼肝胰,外表感神经。”呼吸系统源自内胚层,肺脏作为重要的组成,经历5个主要阶段!
肺在人类妊娠约 4 周时发育,是前肠内胚层的腹侧生长物,周围环绕着中胚层。在形态学上,发育计划经历五个不同的阶段:胚胎(4-7 周)、假腺(5-17 周)、小管(16-26 周)、囊状(26-38 周)和肺泡(36 周-3 岁)
。在此过程中,肺内胚层祖细胞分化为近端气道上皮细胞类型,例如纤毛细胞、基底细胞和分泌细胞,以及远端肺泡细胞类型。近端-远端模式的确定对于肺形态发生至关重要。在小鼠肺发育的假腺阶段,Sox2 + 和 Sox9 + 细胞分别主要位于近端和远端上皮。然而,在人类肺发育中,近端上皮表达 SOX2
,而远端上皮共表达 SOX2 和 SOX9
。
-
res1: The major components required for lung development are readily available at the initiation of lung formation
人类胚胎肺进行了集中采样,涵盖胚胎阶段和早期假腺阶段 图 1a
获得了高质量的 scRNA-seq 图谱,其中包含 169,686 个细胞,平均检测到 3764 个基因和 12,118 个独特分子标识符 (UMI)
注释了六个人胚胎肺细胞簇,包括肺基质细胞、上皮祖细胞、神经嵴祖细胞、内皮祖细胞、原红细胞和巨噬细胞 图1b-d
基质细胞在整个采样期间所占比例最高,其次是上皮细胞、神经嵴和内皮细胞(图1d)
有趣的是,我们观察到所有六种细胞类型早在第 4 周(CS12)就出现了
,揭示了一种易于组装的机制,并表明肺器官发生是通过复杂的细胞串扰启动的 图1e
-
res2: 转录因子调节上皮细胞的近远端模式。
上皮细胞一直是六种主要肺细胞类型中的主要研究焦点。对上皮细胞簇(基质细胞的第二大细胞簇(17,860 个细胞))进行了深入分析。
我们将我们的数据集(最早的时间点)与从第 11 周到成人的公开数据集进行了整合。第 4-8 周是一个分支点
,大多数近端祖细胞(例如基底细胞和纤毛细胞)和远端祖细胞(例如肺泡 1 型和 2 型细胞(AT1 和 AT2))轨迹已经出现。
使用单细胞调控网络推理和聚类 (SCENIC) 计算第 4-8 周期间的 TF 基因(调节子)活性。接下来,使用 Waddington-OT20 (WOT) 通过计算每个相邻时间点之间的细胞间相似性来推断近端和远端细胞命运的评分。最后,根据每个时间点的 WOT 轨迹得分获得了调节子活动的加权和
。我们的研究强调了第 4-8 周期间近端-远端模式形成的分子特征,远远早于小管阶段远端上皮分化的第一个形态学标志。
上皮细胞由 5 个群体组成,包括高度表达已知近端/远端标记物的亚型 SOX2 (EPI_SOX2hi/ETV5lo) 和 SOX9/ETV5 (EPI_SOX9hi/ETV5hi)
RNA 速度分析29表明 EPI_SOX2hi/ETV5lo 和 EPI_SOX9hi/ETV5hi群体源自增殖亚型 EPI_DTL+ 和 EPI_TOP2A+,表明肺上皮祖细胞是预期近端/远端上皮的来源
事实上,在 EPI_SOX2hi/ETV5lo 和 EPI_SOX9hi/ETV5hi 中观察到可辨别的近端-远端表达模式第 4 周的细胞(图 2e、f)
SOX2 和 SOX9 在两个群体中共表达
,直到自第6
周起 EPI_SOX2hi/ETV5lo 细胞中远端因子大幅下调(图 2e ,f),表明近端规范在第 6 周左右开始。一致地,SOX2 的转录活性在第 6 周激增(图2c)。
-
res3 基质细胞有助于上皮近端-远端模式的多样化微环境
基质细胞簇是第 4-8 周肺细胞类型中最大的群体(图1 D)这些基质细胞如何影响胚胎阶段的人肺上皮发育。
根据基质细胞的分子特征鉴定了 7 种基质细胞亚型,包括5 种基质亚型
和2 种平滑肌细胞 (SMC) 亚型
(即气道 SMC (ASMC) 和血管 SMC (VSMC))
由于缺乏单细胞研究,早期基质群体的异质性以前被低估了
。
为了进一步解析这些基质细胞的空间特性,特别是它们与上皮的接近程度,我们通过 10× Visium 技术对 6 周的肺组织切片进行了空间转录组学分析. 上皮亚型 EPI_SOX2hi/ETV5lo 和 EPI_SOX9hi/ETV5hi 及其分子标志物分别在近端和远端上皮中的分布(图 3 g)
ASMC 分布在上皮周围,而 VSMC 分散,没有任何典型模式。smiFISH 分别用标记物 COL9A2/TCF21 和 MYH11/LIMS2 验证了 + 和 ASMC 的空间位置(图 3h)。值得注意的是,LIMS2 是本研究中新发现的 ASMC 标志物。这些结果证明了基质亚型分布的特异性及其在胚胎肺器官发生过程中作为生态位提供者的不同作用
。
我们在主要肺细胞类型之间进行了配体-受体相互作用分析
基质细胞和内皮细胞与上皮细胞的连通性最高,反映了协调气道分支形态发生的基质衍生微环境。
基质亚型对上皮生态位的贡献。有趣的是,我们观察到 BDNF+ 基质亚型仅由 7.5% 的基质细胞组成,负责> 33% 的基质衍生的上皮受体配体 图4a
有趣的是,BDNF+ 基质亚型出现在第 4 周,并随着肺发育逐渐表现出细胞比例降低,这种细胞类型的特征基因,包括 BDNF 和 FGF10,在器官发生过程中也被下调.突出了这种 BDNF + 细胞群在早期胚胎阶段的独特作用。配体-受体相互作用分析进一步阐明,BDNF+ 基质亚型主要解释配体的表达,包括 BDNF、FGF9/10、WNT2/2B、LAMA5 和 NMU,其受体是上皮显性的,表明高度特异性的 BDNF+ 基质-上皮通讯
我们推断 BNDF+ 细胞衍生的配体与上皮细胞上的受体相互作用
- res 4 基质细胞来源的 SMC 命运分叉的特征
基质细胞来源的 VSMC 和 ASMC 对于调节肺功能至关重要。配体-受体相互作用分析表明,VSMC 和 ASMC 共同贡献了超过 30% 的基质-上皮细胞串扰,表明它们在肺上皮发育中起关键作用
编码 VSMC 和 ASMC 的发育起源和细胞规格的程序尚不清楚。因此,我们探讨了 SMC 的发育程序及其在上皮发育中的作用。
跨时间点(第 4-6 周)的单细胞谱分别显示在 UMAP 中,ASMC 和 VSMC 的轨迹评分分别用蓝色和绿色表示
有趣的是,WOT 推断一些基质细胞的细胞命运在第 4-5 周已经偏向于 ASMC 或 VSMC,早于从解剖学上观察到 ASMC(第 5 周)和 VSMC(第 7 周)的阶段
选择轨迹评分> 0.0001 的细胞作为 SMC 谱系。力导向布局显示 ASMC 和 VSMC 发展明显分叉(图 5b)。通过沿 Palantir伪时间排列单元格(5c),我们发现间充质细胞的核心 TFs,例如MEIS2、SHOX2、TWIST1
,随着两种类型的 SMC 的发育而逐渐下调,而真正的 SMC 标志物 ACTA2 和 TAGLN 在 ASMC 和 VSMC 群体中表达并逐渐上调。已报道的 VSMC 和 ASMC 标志物 MEF2C 和 MYH11 的表达水平沿其相应的轨迹特异性上调。
我们确定了经典调节因子,分别用于 VSMC 的 MEF2C 和用于 ASMC 的 ZEB1,验证了该工作流程的稳健性
发现了用于 SMC 开发的新型 TFs,即用于 VSMC 的 EBF1 和用于 ASMC 的 FOXF1
......【讨论了很多基因】
学习一下方法:
scRNA-seq analysis
Processing of sequencing files
The FASTQ files of single-cell libraries generated from Illumina NovaSeq system underwent adaptor index removal byTrim Galore (0.5.0)
. The clean FASTQ files were aligned to the Hg38 and Mm10 genomes with human and mouse gene annotation based onGencode
v35 and vM21 versions, respectively, bySTARsolo function of STAR (2.7.6a)
. For details on the quality of reads, please see Supplementary information, Table S9.Cell filtering, clustering, and visualization
Low-quality cells were filtered out by the number of UMIs and total counts on each sample. We usedScrublet
(1.0) to predict and exclude potential multiplets. To minimize batch effects, we defined the major cell clusters for each sample separately. Briefly, we appliedScanpy
(1.7.2)to reduce the dimensionality of cells byPCA
and perform clustering and visualization byLeiden
andUMAP algorithms
. Cell clusters wereassigned to known lung cell types
based on cell type-specific markers. See Supplementary information,Table S9
for the details on data quality control.
For the visualization of all scRNA-seq data, we first identified the highlyvariable TFs
(scanpy.pp.highly_variable_genes) anddifferentially expressed genes
(DEGs) (P value < 0.001
, Wilcoxon rank-sum test) among major cell types. These TFs and DEGs (1144 genes) were used forcomputing PCA
. We next applied FFT-accelerated Interpolation-based t-SNE73,74 (FIt-SNE) algorithm to display the layout in two dimensions using the top 10 principal components.Identification of the major cell type markers
All data were normalized using functionsscanpy.pp.normalize
andscanpy.pp.log1p
. DEGs of each major cell type were calculated based on the normalized data (P value < 0.01
, Wilcoxon rank-sum test,log fold change ≥ 0.25
). DEGs were listed in Supplementary information, Table S1.Clustering and defining cell subtypes
Top highly variable genes (HVGs)
of each sample were selected and merged. After PCA-based dimensionality reduction, we used harmony to correct batch effects based on top 50 PCs viaharmony.run_harmony
(max_iter_harmony = 15, max_iter_kmeans = 10
) implemented byPython package harmonypy
(0.0.6). Cell clusters were identified using the Leiden algorithm (resolution = 0.5
, Scanpy) based on harmony space. Genes withP value < 0.01
(Wilcoxon rank-sum test) and genelog fold change ≥ 0.25
for each cluster were selected as DEGs.Clusters sharing top DEGs (n = 20)
weremerged
before being classified as cell subtypes. DEGs of cell subtypes can be found in Supplementary information, Table S2.Trajectory
inference for lung epithelial cell and SMC development
To map the developmental trajectories of lung epithelial cells, we integrated and analyzed the dataset obtained in this study and the two datasets from 10× Genomics platform: Miller et al.9 (ArrayExpress: E-MTAB-8221) of 11.5 weeks, 15.0 weeks, 18.0 weeks, 21.0 weeks and Travaglini et al.13 (EGA: EGAS00001004344) of adult lung. The union of TFs within the top 5000 HVGs and top 100 HVGs per dataset (643 genes) were selected for analysis. Next, we appliedmonocle3
to pre-process data, reduce dimensionality, and visualize trajectory. To discern proximal and distal epithelium lineages, as well as VSMC and ASMC, we computed developmental trajectories utilizingWOT
(1.0.8.post2),20 an optimal transport-based approach to infer ancestor–descendant relationships between cells across adjacent time points of development. We obtained the cell trajectory scores via tmap_model.ancestors and tmap_model.trajectories implemented in the Python package WOT.
For VSMC and ASMC trajectory inference, the cells of weeks 4–5 from SC_Early withtrajectory scores > 0.0001
for VSMC and ASMC were considered asSMC progenitors
. The predicted progenitors combined with SMCs were selected to perform trajectory analysis again using Python packageharmonyTS
(0.1.4).77 The developmental pseudotime and branch probability of VSMC and ASMC were obtained via Python package Palantir (1.0.0).61 Then, we smoothed the gene expression via R packagegam
(1.20.1). The heatmaps illustrating the gene expression tendency along the developmental trajectory of ASMC and VSMC were generated by R packageComplexHeatmap
(2.14.0) (Fig. 5c). To observe the bias of cell fate at weeks 4, 5, and 6, we estimated the cell fate probability viatmap_model.fates
implemented in Python packageWOT
.TF activity inference
To discover the transcriptional regulators of lung epithelial proximal–distal patterning, we appliedpySCENIC
(0.11.0) algorithm to identify significant regulons (NES > 1). We then calculated the TF activity based on gene expression levels of each regulon. Weighted sums of TF activities were calculated based onWOT trajectory scores
at each time point. The motifs listed in this study were validated byhttps://jaspar.genereg.net
. For details on the TF list, please see Supplementary information, Tables S3 and S7.RNA velocity analysis
We applied thescVelo
(0.2.3) package to calculate the RNA velocity of epithelial cells at weeks 4, 6 and 8. For each time point, thetop 10 PCs
oftop 3000 HVGs
were used as input. For details of these processes, please see the scVelo pipeline (https://scvelo.readthedocs.io/index.html).Comparison of TFs between
human and mouse epithelia
We collected the mouse lung epithelial single-cell data from E12.0 to E14.0 with an interval of 0.5 days. Low-quality cells were filtered out by the number of detectedgenes (< 1800 & > 11,000)
and totalcounts (counts > 100,000)
. We usedScrublet
to predict and exclude potential doublet cells (score > 0.4
). Clustering and visualization were implemented by functionsscanpy.tl.leiden
and scanpy.tl.umap, respectively. The Nkx2-1 (the core TF for lung epithelium development) and Cdh1 (a universal marker for epithelial cells) positive populations were selected as the mouse lung epithelial cells. Sox2/Sox21/Klf5- and Sox9/Gata6/Etv5-expressing cells were annotated as proximal and distal epithelial cells, respectively.
Human–mouse orthologous genes were used for comparison (Human and Mouse Homologs from MGI database, https://www.informatics.jax.org/homology.shtml). We calculated the differentially expressed TFs in human and mouse (P value < 0.01, t-test
), respectively. The TFs were divided into 5 categories: proximal shared, distal shared, human-specific, mouse-specific and human–mouse inverse. The TF list can be found in Supplementary information, Table S5.Comparison of human and mouse stromal cells
To compare mesenchymal lineages between human and mouse, we integrated and analyzed stromal cells from this study and the Goodwin et al.47 (GEO:GSE153069
) mouse dataset from the 10× Genomics platform. Orthologous genes in human and mouse were extracted for comparison. Then, we calculated species-specific and cell type-specific DEGs across human and mouse stromal cell types at 6–7 weeks and E11.5 (P value < 0.01
, Wilcoxon rank-sum test). These DEG sets included ‘Common markers for stromal cells’, ‘Shared DEGs’, ‘Human SC_BDNF+ specific’, ‘Mouse mesothelium-specific’ and ‘Mouse sub-meso-specific’. For details on DEGs, please see Supplementary information, Table S10.Ligand–receptor interaction analysis
We integrated the ligand–receptor databases,cellphoneDB
and intact (https://www.ebi.ac.uk/intact). The product of the expression percentage (nUMI > 0) of a given ligand and its receptor from the corresponding provider and recipient cells was calculated as the ligand–receptor score. To calculate the significance of the interaction for each ligand–receptor pair, cell type labels were permuted 1000 times to generate the background distribution of ligand–receptor scores (standardized to mean of 0 and standard deviation of 1). P value < 0.001 (U-test) was considered significant. For details on the ligand–receptor list, please see Supplementary information, Table S6.
Spatial transcriptomic analysis
Processing of sequencing files
The FASTQ files of the spatial transcriptomic library was generated by MGISEQ2000. The FASTQ files were aligned to the hg38 genome with human gene annotation based onGencode v32
usingspaceranger (2.0.0)
. The serial number of theVisum slide
is V11T16-101, the slide file can be downloaded fromhttps://support.10xgenomics.com/spatial-gene-expression/software/pipelines/latest/using/slidefile-download
. For the details of data quality, please see Supplementary information, Table S9.Basic analysis and visualization of the Visium data
Low-quality spots were filtered out if the number of detectedgenes < 2000
. The spatial data were normalized via functionsscanpy.pp.normalize
andscanpy.pp.log1p
. Spatial visualization was performed via functionscanpy.pl.spatial
.Mapping single-cell data to spatial transcriptomic data
We usedTangram35 (1.0.3)
to map the annotated scRNA-seq data to the spatial spots. To match these two omics, we selected the 6-week data for mapping, using unnormalized UMI counts as input. For the spatial data, we calculated the Moran’I (I, a measure of spatial autocorrelation) for each gene. Genes with absolute Igreater than 0.05
were selected asspatially variable genes (SVGs)
, as calculated by squipy.gr.spatial_autocorr function of Python packagesquidpy
(1.2.3). A total of 2450 genes of SVGs and HVGs (from single-cell data) were used for mapping, resulting in a probability matrix of the assignment of each cell to all spots. To further access the proportion of cell types in each spot, we summed the probability of each single-cell-annotated cell type. For the details of mapping probability, please see Supplementary information, Table S11.The adjacent score for stromal cells and epithelial cells
We first identified spots surrounding the proximal or distal epithelium region. These spots, as well as those in the epithelial region, were defined as proximal- or distal-associated spots, respectively. To reduce the integration noise between single-cell and spatial data, cell types witha proportion less than 0.4
were excluded in each selected spot. The adjacent score of each cell type to the epithelium was calculated by summing their proportions in proximal- or distal-associated spots. The scores were displayed by Sankey diagram via sankeyNetwork function of R packagenetworkD3
(0.4). The information can be found in Supplementary information, Table S11.Evaluating the score of cellular interactions in Visium slides
We first assigned each cell to a spot based on the maximum mapping probability from Tangram, defined as a recipient cell (expressing receptors), surrounded by provider cells (expressing ligands) in the neighboring spots. Their ligand–receptor scores and P values were calculated as described in the Ligand–receptor interaction analysis section. The ligand–receptor score between distal epithelium and SC_BDNF+ were shown in Supplementary information, Fig. S7c.