10X单细胞 & 10XATAC联合之调控网络分析（IReNA）

hello，大家好，这一篇给大家带来的是10X单细胞和10XATAC联合进行网络调控的分析，也很经典，参考的文献在IReNA: integrated regulatory network analysis of single-cell transcriptomes，我们看看分析的方法和内容,关于10X单细胞联合ATAC分析调控网络的内容，大家可以参考我之前的文章10X单细胞（10X空间转录组）基因调控网络分析之Pando

图片.png

好了，开始分享

IReNA（Integrated Regulatory Network Analysis）是一个用于执行regulatory network analysis的 R 包。 IReNA 包含两种重建基因调控网络的方法。第一种是单独使用单细胞 RNA 测序 (scRNA-seq) 数据。第二个是使用测序 (ATAC-seq) 整合 scRNA-seq 数据和来自 Assay for Transposase Accessible Chromatin 的染色质可及性概况。 IReNA 执行模块化调控网络，以揭示模块之间的关键转录因子和重要调控关系，提供有关调控机制的生物学见解。

For users who want to use ATAC-seq data to refine regulatory relationships, Computer or server of linux system and the following software are needed: samtools, bedtools and fimo.

安装

if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.12")
BiocManager::install(c('Rsamtools', 'ChIPseeker', 'monocle',
                       'RcisTarget', 'RCy3', 'clusterProfiler'))
install.packages("devtools")
devtools::install_github("jiang-junyao/IReNA")
library(IReNA)

Workflow

图片.png

Inputs of IReNA

如果仅使用 scRNA-seq 或大量 RNA-seq 数据运行 IReNA，则需要以下输入：(2) scRNA-seq 或bulk RNA-seq 数据的Raw counts和 (3) 物种的参考基因组 (4) GTF 文件 (5) 转录因子的Motif information。

If both ATAC-seq data and scRNA-seq data are used to perform IReNA, the following files are needed: (1) Bam, peak and footprint files of ATAC-seq data (2) Raw counts of scRNA-seq or bulk RNA-seq data (3) Reference genome of the species (5) Motif information of transcription factors.

(1) Bam, peak and footprint files of ATAC-seq data
(2) Seurat object and monocle object / Raw counts of scRNA-seq or bulk RNA-seq data
对于分析过seurat对象和monocle对象的文件，IReNA提供了将monocle对象的伪时间添加到seurat对象的meta数据中的功能（该功能仅支持monocle2的monocle对象）。然后，继续下一步。

###Add pseudotime to the Seurat object
seurat_with_time <- add_pseudotime(seurat_object, monocle_object)

对于只有raw counts的分析文件，IReNA 提供了“load_counts”函数来加载 scRNA-seq 数据的raw counts，并返回 seurat 对象。如果数据是 10X 格式，设置参数‘datatype = 0’。如果数据为普通计数格式（'txt'为文件名后缀），则设置参数'dayatype =1'。

### load 10X counts
seurat_object <- load_counts('10X_data/sample1/', datatype = 0)
### load normal counts
seurat_object <- load_counts('test_data.txt',datatype = 1)

如果使用bulk RNA-seq 数据来识别基本的调控关系，只需加载bulk RNA-seq 数据的raw counts，然后使用与 scRNA-seq 数据相同的代码进行进一步分析。 建议只加载差异表达基因的表达谱，否则将花费更多的时间来计算每个基因对的相关性。 Deseq2 和 edgeR 可用于识别bulk RNA-seq 数据中差异表达的基因。

(3) Reference genome of the species
(4) GTF file
Gene transfer format, you can download it from http://www.ensembl.org/info/data/ftp/index.html
(5) Motif information of transcription factors
IReNA 包含源自 TRANSFAC 版本 201803 的四种物种（智人、Mus musculus、斑马鱼和鸡）的 DNA 模体数据集。以下代码用于从 TRANSFAC 或user定义的模体数据集中获取模体数据集，其格式应与这些相同来自 TRANSFAC 数据库。

library(IReNA)
###call Mus musculus motif database
motif1 <- Tranfac201803_Mm_MotifTFsF
###call Homo sapiens motif database
motif1 <- Tranfac201803_Hs_MotifTFsF
###call Zebrafish motif database
motif1 <- Tranfac201803_Zf_MotifTFsF
###call Chicken motif database
motif1 <- Tranfac201803_Ch_MotifTFsF

以下内容包含4个部分

IReNA 包含四个主要部分来重建监管网络：

第 1 部分：分析 scRNA-seq 或bulk RNA-seq 数据以获得基本的调控关系
第 2 部分：使用 Fimo（或替代选项：RcisTarget）refine regulatory relaionships（无 ATAC-seq 数据）
第 3 部分：分析 ATAC-seq 数据以refine regulatory relationships（使用 ATAC-seq 数据）
第 4 部分：监管网络分析和可视化

图片.png

Part 1: Analyze scRNA-seq or bulk RNA-seq data to get basic regulatory relationships

IReNA 支持三种输入格式：

scRNA-seq 或bulk RNA-seq 数据的raw count。 raw count可以通过 IReNA 中的“load_counts”函数加载并转换为 Seurat 对象；
包含raw count的 Seurat 对象。数据加载完毕后，IReNA会使用R包monocle计算pseudotime，并将pseudotime添加到Seurat对象的meta数据中。
meta数据中带有伪时间的 Seurat 对象。

Parameter ‘gene.use’ in ‘get_pseudotime’ function indicate the genes use to calculate pseudotime, if it’s null, this function will automatically use variable genes calculated by ‘FindVariableFeatures’ function in Seurat package to calculate pseudotime.

###Read seurat_object
seurat_object <- readRDS('seurat_object.rds')
###calculate the pseudotime and return monocle object
monocle_object <- get_pseudotime(seurat_object,gene.use)
###Add pseudotime to the Seurat object
###This function only support monocle object from monocle2
seurat_with_time <- add_pseudotime(seurat_object, monocle_object)

Next, use differentially expressed genes (DEGs) across the pseudotime to refine the seurat object, if you already have identified DEGs, you just need to run subset function in seurat:

### DEGs used here is the character class
seurat_with_time <- subset(seurat_with_time, features = DEGs)

I also recommend ‘differentialGeneTest’ function in monocle to calculate DEGs across pseudotime. DEGs will be used to make expression profile in further analysis. (If you use our test data, you can skip this part, because our test data only contains differentially expressed genes)

### identify DEGs across pseudotime (qvalue < 0.05 and num_cells_expressed > 0.1)
library(monocle)
monocle_object <- detectGenes(monocle_object, min_expr = 3)
monocle_object <- estimateDispersions(monocle_object)
diff1 <- monocle::differentialGeneTest(mo,fullModelFormulaStr = "~Pseudotime",relative_expr = TRUE)
sig_genes <- subset(diff1, qval < 0.05)
sig_genes <- subset(sig_genes, num_cells_expressed > 0.1)
### Use DEGs to refine seurat object
seurat_with_time <- subset(seurat_with_time, features = rownames(sig_genes))

Then, cells are divided into 50 bins across pseudotime. The bin is removed if all genes in this bin have no expression. The gene is filtered if absolute fold change < 0.01 (set by the parameter ‘FC’). Then, genes will be clustered through K-means algorithm (the number of clusters ‘K’ is set by the parameter ‘K1’).
If bulk RNA-seq data are used to identify regulatory relationships, load your expression matrix as expression_profile that generated by get_SmoothByBin_PseudotimeExp(), then run the following codes. I suggest to input expression profile that only contains differentially expressed genes, or you will a huge of time to calculate correlation of each gene pair.

###Get expression profiles ordered by pseudotime
expression_profile <- get_SmoothByBin_PseudotimeExp(seurat_with_time, Bin = 50)
###Filter noise and logFC in expression profile
expression_profile_filter <- fileter_expression_profile(expression_profile, FC=0.01)
###K-means clustering
clustering <- clustering_Kmeans(expression_profile_filter, K1=4)
clustering[1:5,1:5]
##        KmeansGroup FoldChangeQ95     SmExp1     SmExp2     SmExp3
## TCEB3            1      2.395806 -0.2424532 -0.8964990 -0.9124960
## CLK1             1      2.508335 -0.1819044  0.7624798  0.4867972
## MATR3            1      2.700294 -1.4485729  0.7837425  0.3028892
## AKAP11           1      2.415084 -0.6120681 -0.3849580  0.3898393
## HSF2             1      2.528111 -0.8125698 -0.6166004  0.8533309

Visualize the clustering of gene expression profiles through the heatmap

plot_kmeans_pheatmap(clustering,ModuleColor1 = c('#67C7C1','#5BA6DA','#FFBF0F','#C067A9'))

图片.png

由于Kmeans算法的特点，每次聚类都会得到不同的结果。

在第一列中添加基因的Ensmble ID，然后计算每个基因对的相关性并选择包含至少一个转录因子且绝对相关性> 0.4（由参数'correlation_filter'设置）的基因对。 ‘correlation_filter’的值取决于数据的噪声，数据噪声越大，需要越大的‘correlation_filter’来获得可信的相关关系。建议将 p 值 < 0.05 的相关性用于参数“correlation_filter”。

###Add Ensembl ID as the first column of clustering results
Kmeans_clustering_ENS <- add_ENSID(clustering, Spec1='Hs')
Kmeans_clustering_ENS[1:5,1:5]
##                 Symbol KmeansGroup FoldChangeQ95     SmExp1     SmExp2
## ENSG00000011007  TCEB3           1      2.395806 -0.2424532 -0.8964990
## ENSG00000013441   CLK1           1      2.508335 -0.1819044  0.7624798
## ENSG00000015479  MATR3           1      2.700294 -1.4485729  0.7837425
## ENSG00000023516 AKAP11           1      2.415084 -0.6120681 -0.3849580
## ENSG00000025156   HSF2           1      2.528111 -0.8125698 -0.6166004
### Caculate the correlation
motif1 <- Tranfac201803_Hs_MotifTFsF
### for 
regulatory_relationships <- get_cor(Kmeans_clustering_ENS, motif = motif1, correlation_filter = 0.6, start_column = 4)

Part 2: Analyze binding motifs of target genes to refine regulatory relaionships (without ATAC-seq data)

For ATAC-seq data is not available. IReNA uses fimo to check whether motifs of transcription factors that regulate the target gene occur in the upstream of target genes. If motifs of transcription factors that regulate the target gene exist, this gene pair will be retained.

First, get TSS (transcription start site) regions (default is upstream 1000 to downstream 500) of target genes. Gtf file used here is available from http://www.ensembl.org/info/data/ftp/index.html

gtf <- read.delim("D:/Homo_sapiens.GRCh38.104.gtf", header=FALSE, comment.char="#")
gene_tss <- get_tss_region(gtf,rownames(Kmeans_clustering_ENS))
head(gene_tss)
##              gene  chr    start      end
## 1 ENSG00000237973 chr1   630074   631574
## 2 ENSG00000116285 chr1  8027309  8025809
## 3 ENSG00000162441 chr1  9944407  9942907
## 4 ENSG00000116273 chr1  6612731  6614231
## 5 ENSG00000177674 chr1 11735084 11736584
## 6 ENSG00000171608 chr1  9650731  9652231

Then, get the sequence of these TSS regions based on reference genome (reference genome can be download from UCSC database), and use fimo to scan these sequence to check whether the motif of transcription factor that regulates the target gene occur in the promoter regions of the target gene. Because fimo software only supports linux environment, we generate some shell script to run Fimo software.

应该设置以下四个参数：

refdir：参考基因组路径
fimodir：Fimo 软件的路径。如果 Fimo 已经设置为全局环境变量，只需设置‘fimodir <- fimo’。
outputdir1：shell脚本的输出路径和目标基因tss区域的序列。（函数'find_motifs_targetgenes'会在'outputdir1'路径下自动生成两个文件夹（'fasta'和'fimo'），并存储目标基因tss区域的序列在“fimo”中的“fasta”和shell脚本中）
Motifdir：motif 文件的路径，可从 https://github.com/jiang-junyao/MEMEmotif 或 TRANSFAC 数据库下载。

Please note that, at the end of ‘outputdir1’,‘motifdir’ and ‘sequencedir’, the symbol ‘/’should be contained. What’s more, the chromosome name of your reference genome used here should be the same as the chromosome name in the gene_tss

### run the following codes in the R under linux environment
refdir='/public/home/user/genome/hg38.fa'
fimodir <- 'fimo'
outputdir1 <- '/public/home/user/fimo/'
motifdir <- '/public/home/user/fimo/Mememotif/'
find_motifs_targetgenes(gene_tss,motif1,refdir,fimodir,outputdir1,motifdir)

Then run the following shell codes to activate fimo

### run the following codes in the shell
cd /public/home/user/fimo/fimo
sh fimoall.sh

Then, Fimo result are stored in the ‘fimo’ folder under outputdir1, and we run the following R codes to refine regulatory relationships

motif1 <- Tranfac201803_Hs_MotifTFsF
outputdir <- paste0(outputdir1,'fimo/')
fimo_regulation <- generate_fimo_regulation(outputdir,motif1)
filtered_regulatory_relationships <- filter_regulation_fimo(fimo_regulation, regulatory_relationships)

Part 3: Analyze ATAC-seq data to refine regulatory relationships (with ATAC-seq data)

For ATAC-seq data is available, IReNA calculates the footprints of transcription factors and footprint occupancy score (FOS) to refine regulatory relationships. The footprints whose distance is less than 4 are merged and then the sequence of each footprint is obtained from the reference genome through the function ‘get_merged_fasta’. The reference genome should be in fasta/fa format which can be downloaded from UCSC or other genome database.

###merge footprints whose distance is less than 4
filtered_footprints <- read.table('footprints.bed',sep = '\t')
fastadir <- 'Genome/hg38.fa' 
merged_fasta <- get_merged_fasta(filtered_footprints,fastadir)
write.table(merged_fasta,'merged_footprints.fasta',row.names=F,quote=F,col.names=F)

After obtaining the motif sequences, use fimo software to identify binding motifs in the footprints. Because fimo software only supports linux environment, we generate a shell script to run Fimo software.

First, identify differentially expressed genes related motifs through the function ‘motif_select’, which will reduce the running time of the subsequent analysis process.

Then, you should set the following five parameters:

1. fimodir: path of Fimo software. If Fimo has been set to the global environment variable, just set ‘fimodir <- fimo’.
1. outputdir1: output path of the shell scripts.
1. outputdir: output path of Fimo result.
1. motifdir: path of the motif file, which can be downloaded from https://github.com/jiang-junyao/MEMEmotif or TRANSFAC database.
1. sequencedir: path of the sequence which is generated by the function ‘get_merged_fasta’.

Please note that, at the end of ‘outputdir’ and ‘sequencedir’ the symbol ‘/’should be contained.

### Identify differentially expressed genes related motifs
motif1 <- motifs_select(Tranfac201803_Hs_MotifTFsF, rownames(Kmeans_clustering_ENS)) ###Kmeans_clustering_ENS was obtained in part1
### run find_motifs()
fimodir <- 'fimo'
outputdir1 <- '/public/home/user/fimo/'
outputdir <- '/public/home/user/fimo/output/'
motifdir <- '/public/home/user/fimo/Mememotif/'
sequencedir <- '/public/home/user/fimo/merged_footprints.fasta'
find_motifs(motif1,step=20,fimodir, outputdir1, outputdir, motifdir, sequencedir)

Then run the following shell codes to activate fimo

### run the following commands in the shell
cd /public/home/user/fimo/ 
mkdir output
sh ./fimo_all.sh

Then, we combine these Fimo consequence in ‘outputdir’. Notably outputdir folder should only contain fimo result files. Next, we load the peaks file and overlap differential peaks and motif footprints through overlap_footprints_peaks() function

###Combine all footprints of motifs
combined <- combine_footprints(outputdir)
peaks <- read.delim('differential_peaks.bed')
overlapped <- overlap_footprints_peaks(combined,peaks)

However, the running time of overlap_footprints() is too long, so it’s highly recommanded to use bedtools to do overlap in linux system. If you want to use bedtools to do overlap, you need to output ‘combined_footprints’ dataframe, and transfer it to shell.(If you make analysis in linux system, ignore transfer part)

### output combined_footprints
write.table(combined,'combined.txt',quote = F,row.names = F,col.names = F,sep = '\t')
### Transfer combied.txt and differential_peaks.bed to linux system, and then run the following commands in the shell
bedtools intersect -a combined.txt -b differential_peaks.bed -wa -wb > overlappd.txt

Next, we intergrate bioconductor package ChIPseeker to get footprint-related genes. Before we run get_related_genes(), we need to specify TxDb, which can be download from: Bioconductor TxDb list. Kmeans_clustering_ENS used here was obtained in part1.

### If you make overlap by bedtools, read 'overlapped.txt' to R
overlapped <- read.table('overlapped.txt')
###get footprint-related genes
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
list1 <- get_related_genes(overlapped,txdb = txdb,motif=Tranfac201803_Mm_MotifTFsF,Species = 'Hs')
###Get candidate genes/TFs-related peaks
list2 <- get_related_peaks(list1,Kmeans_clustering_ENS)
### output filtered footprints
write.table(list2[[1]],'filtered_footprints.bed', quote = F, row.names = F, col.names = F, sep = '\t')

Then, because of the size of original BAM file is too large, so we need to use samtools to extract footprints realated regions in BAM to reduce running time of function which analyze bam files in IReNA. (If you use our test data, just skip this step)

### transfer filtered_footprints.bed to linux system and run the following codes
samtools view -hb -L filtered_footprint.bed SSC_patient1.bam > SSC1_filter.bam
samtools view -hb -L filtered_footprint.bed SSC_patient2.bam > SSC2_filter.bam
samtools view -hb -L filtered_footprint.bed esc.bam > esc_filter.bam

此外，通过 wig_track() 计算足迹中每个位置的切割，并使用这些切割来计算足迹的 FOS，以识别决定监管关系的丰富 TF。此处使用的regulatory_relationships 是在第1 部分中计算的。这里使用的参数 FOS_threshold 是过滤低质量足迹的阈值，您可以增加它以减少导出足迹的数量。

### calculate cuts of each each position in footprints
bamfilepath1 <- 'SSC1_filter.bam'
bamfilepath2 <- 'SSC2_filter.bam'
bamfilepath3 <- 'esc_filter.bam'
cuts1 <- wig_track(bamfilepath = bamfilepath1,bedfile = list2[[1]])
cuts2 <- wig_track(bamfilepath = bamfilepath2,bedfile = list2[[1]])
cuts3 <- wig_track(bamfilepath = bamfilepath3,bedfile = list2[[1]])
wig_list <- list(cuts1,cuts2,cuts3)
### get related genes of footprints with high FOS
potential_regulation <- Footprints_FOS(wig_list,list2[[2]], FOS_threshold = 0.1)
### Use information of footprints with high FOS to refine regulatory relationships
filtered_regulatory <- filter_ATAC(potential_regulation,regulatory_relationships)

Part 4: Regulatory network analysis and visualization

After we get ‘filtered_regulatory_relationships’ and ‘Kmeans_clustering_ENS’, we can reconstruct regulatory network. Run network_analysis() to get regulatory, this function will generate a list which contain the following 9 elements:

(1)Cor_TFs.txt: list of expressed TFs in the gene networks.
(2)Cor_EnTFs.txt: list of TFs which significantly regulate gene modules (or enriched TFs).
(3)FOSF_RegMTF_Cor_EnTFs.txt: regulatory pairs in which the source gene is enriched TF.
(4)FOSF_RegMTF_Cor_EnTFs.txt: regulatory pairs in which both source gene and target gene are enriched TFs.
(5)FOSF_RegMTF_Cor_EnTFs.txt: regulatory pairs only including regulations within each module but not those between modules, in this step
(6)TF_list: enriched TFs which significantly regulate gene modules
(7)TF_module_regulation: details of enriched TFs which significantly regulate gene modules
(8)TF_network: regulatory network for enriched transcription factors of each module
(9)intramodular_network: intramodular regulatory network

Here, we use refined regulatory relationships from part2 to reconstruct regulatory networks

TFs_list <- network_analysis(filtered_regulatory_relationships,Kmeans_clustering_ENS,TFFDR1 = 10,TFFDR2 = 10)

We can also make enrichment analysis for differentially expressed genes in each module. Before you run this function, you need to download the org.db for your species through BiocManager.

### Download Homo sapiens org.db
#BiocManger::install('org.Hs.eg.db')
library(org.Hs.eg.db)
### Enrichment analysis (KEGG)
enrichment_KEGG <- enrich_module(Kmeans_clustering_ENS, org.Hs.eg.db, enrich.db = 'KEGG',organism = 'hsa')
#enrichment_GO <- enrich_module(Kmeans_cluster_ENS, org.Hs.eg.db, 'GO')
head(enrichment_KEGG)
##                ID                 Description module -log10(q-value) GeneRatio
## hsa03010 hsa03010                    Ribosome      1        6.500237    13/101
## hsa03040 hsa03040                 Spliceosome      1        1.964315     9/101
## hsa03022 hsa03022 Basal transcription factors      1        1.964315     5/101
## hsa05016 hsa05016        Huntington's disease      1        1.126355     9/101
## hsa03013 hsa03013               RNA transport      1        1.126355     8/101
## hsa05200 hsa05200          Pathways in cancer      2        4.866522    38/276
##           BgRatio       pvalue     p.adjust       qvalue
## hsa03010  92/5894 3.661614e-09 3.258836e-07 3.160551e-07
## hsa03040 128/5894 3.138210e-04 1.119399e-02 1.085638e-02
## hsa03022  37/5894 3.773255e-04 1.119399e-02 1.085638e-02
## hsa05016 183/5894 3.932733e-03 7.708051e-02 7.475579e-02
## hsa03013 152/5894 4.330366e-03 7.708051e-02 7.475579e-02
## hsa05200 327/5894 1.153409e-07 1.949262e-05 1.359809e-05
##                                                                                                                                                                                                  geneID
## hsa03010                                                                                                                              6122/6143/6206/6194/51187/6135/6161/6167/6159/6166/9045/6136/6191
## hsa03040                                                                                                                                             10992/3312/9775/10569/83443/5356/10594/8449/494115
## hsa03022                                                                                                                                                                       9519/2962/6877/2957/6879
## hsa05016                                                                                                                                                 9519/5978/7019/51164/498/27089/54205/4512/2876
## hsa03013                                                                                                                                                     11218/8761/1983/9775/5042/50628/6612/10921
## hsa05200 8312/2263/2260/6932/7188/3912/6655/331/5595/7976/5296/1021/7473/4790/329/1027/2335/6772/6654/5290/5335/2258/3845/5156/54583/5899/3688/26060/3716/836/6774/8313/5293/3913/6777/5727/5154/112398
##          Count
## hsa03010    13
## hsa03040     9
## hsa03022     5
## hsa05016     9
## hsa03013     8
## hsa05200    38

Moreover, you can do GO analysis

library(org.Hs.eg.db)
### Enrichment analysis (GO)
enrichment_GO <- enrich_module(Kmeans_clustering_ENS, enrich.db = 'GO',org.Hs.eg.db)

You can visualize regulatory networks for enriched transcription factors of each module through plot_network() function by setting type parameter as ‘TF’. This plot shows regulatory relationships between transcription factors in different modules that significantly regulating other modules. The size of each vertex determine the significance of this transcription factor. Yellow edges are positive regulation, grey edges are negative regulation.

plot_tf_network(TFs_list)

图片.png

可以通过 plot_intramodular_network() 函数可视化具有丰富功能的模块内网络。在运行此功能之前，您可以选择要在图中显示的每个模块的丰富功能。如果输入所有丰富的函数，该函数将自动选择每个模块中-log10(qvalue) 最高的函数呈现在图中。此外，每个模块中边数最多的转录因子也将显示在图中。

### select functions that you want to present in the figure
enrichment_KEGG_select <- enrichment_KEGG[c(3,7,11),]
### plotting
plot_intramodular_network(TFs_list,enrichment_KEGG_select,layout = 'circle')

图片.png

It is strongly recommended to use Cytoscape to display the regulatory networks. We provide a function that can provide different Cytoscape styles. You need to install and open Cytoscape before running the function.

###optional: display the network in cytoscape, open cytoscape before running this function
initiate_cy(TFs_list, layout1='degree-circle', type='TF')
initiate_cy(TFs_list, layout1='grid', type='module')

These are the picture we processed through Cytoscape, which can show the regulatory relationship of modularized transcription factors.

Use Cytoscape to visualize regulatory network for enriched transcription factors of each module

图片.png

Use Cytoscape to visualize intramodular network

图片.png

Option: Use RcisTarget to refine regulatory relaionships (without ATAC-seq data)

IReNA also provides filter_regulation function (Based on RcisTarget) to refine regulation relationships. Due to the limitations of RcisTarget, this function currently only supports three species (Hs, Mm and Fly). So if the species of your data is not included, and you don’t have ATAC-seq data, you can use unrefined regulatory relaionships to perform part4 analysis directly.

Before run this function, you need to download Gene-motif rankings database from https://resources.aertslab.org/cistarget/ and set the Rankingspath1 as the path of the downloaded Gene-motif rankings database. If you don’t know which database to choose, we suggest that using ‘hg19-500bp-upstream-7species.mc9nr’ for human, using ‘mm9-500bp-upstream-10species.mc9nr’ for mouse, using ‘dm6-5kb-upstream-full-tx-11species.mc8nr’ for fly. You can download it manually, or use R code (If you are in mainland china, i suggest you to download database from website):

### Download Gene-motif rankings database
featherURL <- "https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg19/refseq_r45/mc9nr/gene_based/hg19-tss-centered-10kb-7species.mc9nr.feather"
download.file(featherURL, destfile=basename(featherURL)) # saved in current dir
### Refine regulatory relaionships
Rankingspath1 <- 'hg19-500bp-upstream-7species.mc9nr1.feather' # download from https://resources.aertslab.org/cistarget/
filtered_regulation_Rcis <- filter_regulation(Kmeans_clustering_ENS, regulatory_relationships, 'Hs', Rankingspath1)

生活很好，有你更好

最后编辑于：2021.11.30 17:23:12

禁止转载，如需转载请通过简信或评论联系作者。

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 194,670评论 5赞 460
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 81,928评论 2赞 371
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 141,926评论 0赞 320
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 52,238评论 1赞 263
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 61,112评论 4赞 356
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 46,138评论 1赞 272
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 36,545评论 3赞 381
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 35,232评论 0赞 253
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 39,496评论 1赞 290
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 34,596评论 2赞 310
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 36,369评论 1赞 326
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 32,226评论 3赞 313
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 37,600评论 3赞 299
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 28,906评论 0赞 17
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 30,185评论 1赞 250
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 41,516评论 2赞 341
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 40,721评论 2赞 335

10X单细胞 & 10XATAC联合之调控网络分析（IReNA）

好了，开始分享

For users who want to use ATAC-seq data to refine regulatory relationships, Computer or server of linux system and the following software are needed: samtools, bedtools and fimo.

安装

Workflow

Inputs of IReNA

如果仅使用 scRNA-seq 或大量 RNA-seq 数据运行 IReNA，则需要以下输入：(2) scRNA-seq 或bulk RNA-seq 数据的Raw counts和 (3) 物种的参考基因组 (4) GTF 文件 (5) 转录因子的Motif information。

If both ATAC-seq data and scRNA-seq data are used to perform IReNA, the following files are needed: (1) Bam, peak and footprint files of ATAC-seq data (2) Raw counts of scRNA-seq or bulk RNA-seq data (3) Reference genome of the species (5) Motif information of transcription factors.

以下内容包含4个部分

IReNA 包含四个主要部分来重建监管网络：

Part 1: Analyze scRNA-seq or bulk RNA-seq data to get basic regulatory relationships

IReNA 支持三种输入格式：

Next, use differentially expressed genes (DEGs) across the pseudotime to refine the seurat object, if you already have identified DEGs, you just need to run subset function in seurat:

Visualize the clustering of gene expression profiles through the heatmap

Part 2: Analyze binding motifs of target genes to refine regulatory relaionships (without ATAC-seq data)

应该设置以下四个参数：

Please note that, at the end of ‘outputdir1’,‘motifdir’ and ‘sequencedir’, the symbol ‘/’should be contained. What’s more, the chromosome name of your reference genome used here should be the same as the chromosome name in the gene_tss

Then run the following shell codes to activate fimo

Then, Fimo result are stored in the ‘fimo’ folder under outputdir1, and we run the following R codes to refine regulatory relationships

Part 3: Analyze ATAC-seq data to refine regulatory relationships (with ATAC-seq data)

Please note that, at the end of ‘outputdir’ and ‘sequencedir’ the symbol ‘/’should be contained.

Then run the following shell codes to activate fimo

Next, we intergrate bioconductor package ChIPseeker to get footprint-related genes. Before we run get_related_genes(), we need to specify TxDb, which can be download from: Bioconductor TxDb list. Kmeans_clustering_ENS used here was obtained in part1.

Part 4: Regulatory network analysis and visualization

We can also make enrichment analysis for differentially expressed genes in each module. Before you run this function, you need to download the org.db for your species through BiocManager.

It is strongly recommended to use Cytoscape to display the regulatory networks. We provide a function that can provide different Cytoscape styles. You need to install and open Cytoscape before running the function.

Use Cytoscape to visualize intramodular network

Option: Use RcisTarget to refine regulatory relaionships (without ATAC-seq data)

推荐阅读更多精彩内容