子宫内膜癌的RNA editing数据分析

参考文献

[PMID: 27694136]
DRETools - F1000Research

数据下载

使用本实验室一个RNA-seq(GSE56087)和一个circRNA-seq(未发布)的数据作为discovery dataset 和 validation dataset。

RNA-seq数据下载

shell命令批量下载

#!/bin/sh
#调用prefetch下载SRA数据库的测序文件,并用pfastq-dump转化为fastq文件。
#需要在当前目录的download_list.txt文件中把SRA编号,每个一行事先准备好,具体格式参考example.txt
#建议使用NCBI的SRA RUN SELECTOR直接导出。
read  -p "请输入下载列表文件名:" downloadlist
read  -p "请选择单端(SE),双端(PE)测序:" class
read  -p "请输入要使用的线程数:" thread
if [ $class = "SE" ]; then
    cat $downloadlist | while read line
  do
    echo $line 
    prefetch $line -o ./${line}.sra
    pfastq-dump --threads $thread --outdir ./ ./${line}.sra
  done
elif [ $class = "PE" ]; then
    cat $downloadlist | while read line
  do
    echo $line
    prefetch $line -o ./${line}.sra
    pfastq-dump --split-3 --threads $thread --outdir ./ ./${line}.sra 
  done
else echo "错误的输入!"
fi

相应的accession number

SRR1200850
SRR1200851
SRR1200852
SRR1200853
SRR1200854
SRR1200855
SRR1200856
SRR1200857
SRR1200860
SRR1200861
SRR1200858
SRR1200859
SRR1200862
SRR1200863
SRR1200864
SRR1200865
SRR1200866
SRR1200867
SRR1200868
SRR1200869
SRR1200870
SRR1200871
SRR1200872
SRR1200873
SRR1200874
SRR1200875
SRR1200876
SRR1200877
SRR1200878
SRR1200879
SRR1200880
SRR1200881
SRR1200882
SRR1200883
SRR1200884
SRR1200885

数据分析

主要使用RNAEditor软件分析,环境配置按照官网的配置,注意相应软件的版本,命令需要在RNAEditor目录下运行,并且要求不能在SSH界面下运行,会报错(无法画图),需要在图形界面下打开终端运行。
shell批量分析

for ((i=1200851;i<1200886;i++))
do
fastp -w 16 -i ~/RNASEQ/EC_lina/SRR${i}_1.fastq -I ~/RNASEQ/EC_lina/SRR${i}_2.fastq -o ~/RNASEQ/EC_lina/SRR${i}_1_fastpedited.fastq -O ~/RNASEQ/EC_lina/SRR${i}_2_fastpedited.fastq 
python RNAEditor.py -i ~/RNASEQ/EC_lina/SRR${i}_1_fastpedited.fastq ~/RNASEQ/EC_lina/SRR${i}_2_fastpedited.fastq -c configuration.txt
done

配置文件configuration.txt如下

#This file is used to configure the behaviour of RNAeditor

#Standard input files
refGenome = /home/zhou/rnaEditor_annotations/human/GRCH38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
gtfFile = /home/zhou/rnaEditor_annotations/human/GRCH38/Homo_sapiens.GRCh38.83.gtf
dbSNP = /home/zhou/rnaEditor_annotations/human/GRCH38/dbSNP.vcf
hapmap = /home/zhou/rnaEditor_annotations/human/GRCH38/HAPMAP.vcf
omni = /home/zhou/rnaEditor_annotations/human/GRCH38/1000GenomeProject.vcf
esp = /home/zhou/rnaEditor_annotations/human/GRCH38/ESP_filtered
aluRegions = /home/zhou/rnaEditor_annotations/human/GRCH38/repeats.bed
output = default
sourceDir = /usr/local/bin/
maxDiff = 0.04
seedDiff = 2
standCall = 0
standEmit = 0
edgeDistance = 3
paired = True 
keepTemp = True
overwrite = False
threads = 23

开23个核,每个样本分析时间大概在十个小时左右。

结果解析

结果实例

image.png

VCF文件

VCF文件包含所有的编辑站点以供进一步分析.


image.png

GCF文件

GCF文件保存了每个编辑站点的附加信息,如基因名称、片段、总读数、编辑读数和编辑比。


image.png

summary文件

summary文件显示每个基因的RNA编辑所处基因位置(如3‘UTR,5‘UTR)的数量。


image.png

bed文件

Editing island的bed文件,可供后续可视化等分析。


image.png

打包除了bam,sam,sai这两种文件以外的所有结果

tar -cvf RNAEditor_1.tar ~/RNASEQ/EC_lina/rnaEditor/ --exclude=*bam --exclude=*sam --exclude=*sai
pigz RNAEditor_1.tar

circRNA-seq数据集的验证工作

shell批量分析

#!/bin/sh
#把fastq文件路径存入数组
c=0
for file in `find ~/RNASEQ/CircularRNA-seq/rawdata/ -name *fastq.gz -print`
do
  filelist[$c]=$file
  ((c++))
done

for ((i=0;i<${#filelist[@]};i=i+2))
    do
        tmp1=$(echo ${filelist[$i]}|sed 's/\.fastq\.gz/\_fastpedited\.fastq/g')
        tmp2=$(echo ${filelist[$i+1]}|sed 's/\.fastq\.gz/\_fastpedited\.fastq/g')
    #fastp预处理
        fastp -w 16 -i ${filelist[$i]} -I ${filelist[$i+1]} -o $tmp1 -O $tmp2 
        #pigz -d $tmp1
        #pigz -d $tmp2
        python RNAEditor.py -i $tmp1 $tmp2 -c configuration.txt
    done

configuration.txt不变

打包除了bam,sam,sai这两种文件以外的所有结果

tar -cvf RNAEditor_2.tar ~/RNASEQ/CircularRNA-seq/rawdata/Sample_ZYF-*/rnaEditor --exclude=*bam --exclude=*sam --exclude=*sai
pigz RNAEditor_2.tar

后续分析

DREtools差异分析

#Merge editing sites from multiple samples
dretools edsite-merge     --min-editing  3      --min-coverage 5      --min-samples  3      --vcf ./*/*.editingSites.vcf > DRE/consensus_sites.vcf
#Calculate sample-wise EPK
for i in `find ./SRR*/ -name *fastpedited.bam -print` ; 
do  
name=$(echo ${i}|sed 's/_1.*bam//'|sed 's/\.\///');  
dretools sample-epk --name $name --vcf ./DRE/consensus_sites.vcf --alignment $i > ./DRE/${name}.sample_epk.tsv; 
done
#Calculate site-wise EPK
for i in `find ./SRR*/ -name *fastpedited.bam -print` ; 
do 
name=$(echo ${i}|sed 's/_1.*bam//'|sed 's/\.\///');  
dretools edsite-epk --vcf ./DRE/consensus_sites.vcf --alignment $i > ./DRE/${name}.edsite_epk.tsv; 
done
#Find differentially edited editing sites
dretools edsite-diff                    --max-depth-cov 5.0                 --min-depth 2       \
--names normal,tumor \
--sites ./DRE/consensus_sites.vcf \
--sample-epk ./DRE/SRR1200850.sample_epk.tsv,./DRE/SRR1200851.sample_epk.tsv,./DRE/SRR1200854.sample_epk.tsv,./DRE/SRR1200855.sample_epk.tsv,./DRE/SRR1200858.sample_epk.tsv,./DRE/SRR1200859.sample_epk.tsv,./DRE/SRR1200862.sample_epk.tsv,./DRE/SRR1200863.sample_epk.tsv,./DRE/SRR1200866.sample_epk.tsv,./DRE/SRR1200867.sample_epk.tsv,./DRE/SRR1200872.sample_epk.tsv,./DRE/SRR1200873.sample_epk.tsv,./DRE/SRR1200874.sample_epk.tsv,./DRE/SRR1200875.sample_epk.tsv,./DRE/SRR1200878.sample_epk.tsv,./DRE/SRR1200879.sample_epk.tsv,./DRE/SRR1200882.sample_epk.tsv,./DRE/SRR1200883.sample_epk.tsv \
./DRE/SRR1200852.sample_epk.tsv,./DRE/SRR1200853.sample_epk.tsv,./DRE/SRR1200856.sample_epk.tsv,./DRE/SRR1200857.sample_epk.tsv,./DRE/SRR1200860.sample_epk.tsv,./DRE/SRR1200861.sample_epk.tsv,./DRE/SRR1200864.sample_epk.tsv,./DRE/SRR1200865.sample_epk.tsv,./DRE/SRR1200868.sample_epk.tsv,./DRE/SRR1200869.sample_epk.tsv,./DRE/SRR1200870.sample_epk.tsv,./DRE/SRR1200871.sample_epk.tsv,./DRE/SRR1200876.sample_epk.tsv,./DRE/SRR1200877.sample_epk.tsv,./DRE/SRR1200880.sample_epk.tsv,./DRE/SRR1200881.sample_epk.tsv,./DRE/SRR1200884.sample_epk.tsv,./DRE/SRR1200885.sample_epk.tsv  \
--site-epk ./DRE/SRR1200850.edsite_epk.tsv,./DRE/SRR1200851.edsite_epk.tsv,./DRE/SRR1200854.edsite_epk.tsv,./DRE/SRR1200855.edsite_epk.tsv,./DRE/SRR1200858.edsite_epk.tsv,./DRE/SRR1200859.edsite_epk.tsv,./DRE/SRR1200862.edsite_epk.tsv,./DRE/SRR1200863.edsite_epk.tsv,./DRE/SRR1200866.edsite_epk.tsv,./DRE/SRR1200867.edsite_epk.tsv,./DRE/SRR1200872.edsite_epk.tsv,./DRE/SRR1200873.edsite_epk.tsv,./DRE/SRR1200874.edsite_epk.tsv,./DRE/SRR1200875.edsite_epk.tsv,./DRE/SRR1200878.edsite_epk.tsv,./DRE/SRR1200879.edsite_epk.tsv,./DRE/SRR1200882.edsite_epk.tsv,./DRE/SRR1200883.edsite_epk.tsv \
./DRE/SRR1200852.edsite_epk.tsv,./DRE/SRR1200853.edsite_epk.tsv,./DRE/SRR1200856.edsite_epk.tsv,./DRE/SRR1200857.edsite_epk.tsv,./DRE/SRR1200860.edsite_epk.tsv,./DRE/SRR1200861.edsite_epk.tsv,./DRE/SRR1200864.edsite_epk.tsv,./DRE/SRR1200865.edsite_epk.tsv,./DRE/SRR1200868.edsite_epk.tsv,./DRE/SRR1200869.edsite_epk.tsv,./DRE/SRR1200870.edsite_epk.tsv,./DRE/SRR1200871.edsite_epk.tsv,./DRE/SRR1200876.edsite_epk.tsv,./DRE/SRR1200877.edsite_epk.tsv,./DRE/SRR1200880.edsite_epk.tsv,./DRE/SRR1200881.edsite_epk.tsv,./DRE/SRR1200884.edsite_epk.tsv,./DRE/SRR1200885.edsite_epk.tsv  \
> ./DRE/diff_sites.tsv
#Detect editing islands
dretools find-islands \
    --min-editing 3   \
    --min-coverage 5  \
    --min-length 20   \
    --min-points 5    \
    --epsilon 50      \
    --vcf ./*/*.editingSites.vcf  > ./DRE/islands.bed
#Calculate Island-EPK
for i in `find ./SRR*/ -name *fastpedited.bam -print` ; 
do  
name=$(echo ${i}|sed 's/_1.*bam//'|sed 's/\.\///');  
dretools region-epk --regions ./DRE/islands.bed --vcf ./DRE/consensus_sites.vcf --alignment $i > ./DRE/${name}.island_epk.tsv; 
done
#Find differentially edited islands
dretools region-diff         --regions  ./DRE/islands.bed           --min-area  1                 --min-depth 1       \
--names normal,tumor \
--sample-epk ./DRE/SRR1200850.sample_epk.tsv,./DRE/SRR1200851.sample_epk.tsv,./DRE/SRR1200854.sample_epk.tsv,./DRE/SRR1200855.sample_epk.tsv,./DRE/SRR1200858.sample_epk.tsv,./DRE/SRR1200859.sample_epk.tsv,./DRE/SRR1200862.sample_epk.tsv,./DRE/SRR1200863.sample_epk.tsv,./DRE/SRR1200866.sample_epk.tsv,./DRE/SRR1200867.sample_epk.tsv,./DRE/SRR1200872.sample_epk.tsv,./DRE/SRR1200873.sample_epk.tsv,./DRE/SRR1200874.sample_epk.tsv,./DRE/SRR1200875.sample_epk.tsv,./DRE/SRR1200878.sample_epk.tsv,./DRE/SRR1200879.sample_epk.tsv,./DRE/SRR1200882.sample_epk.tsv,./DRE/SRR1200883.sample_epk.tsv \
./DRE/SRR1200852.sample_epk.tsv,./DRE/SRR1200853.sample_epk.tsv,./DRE/SRR1200856.sample_epk.tsv,./DRE/SRR1200857.sample_epk.tsv,./DRE/SRR1200860.sample_epk.tsv,./DRE/SRR1200861.sample_epk.tsv,./DRE/SRR1200864.sample_epk.tsv,./DRE/SRR1200865.sample_epk.tsv,./DRE/SRR1200868.sample_epk.tsv,./DRE/SRR1200869.sample_epk.tsv,./DRE/SRR1200870.sample_epk.tsv,./DRE/SRR1200871.sample_epk.tsv,./DRE/SRR1200876.sample_epk.tsv,./DRE/SRR1200877.sample_epk.tsv,./DRE/SRR1200880.sample_epk.tsv,./DRE/SRR1200881.sample_epk.tsv,./DRE/SRR1200884.sample_epk.tsv,./DRE/SRR1200885.sample_epk.tsv  \
--region-epk  ./DRE/SRR1200850.island_epk.tsv,./DRE/SRR1200851.island_epk.tsv,./DRE/SRR1200854.island_epk.tsv,./DRE/SRR1200855.island_epk.tsv,./DRE/SRR1200858.island_epk.tsv,./DRE/SRR1200859.island_epk.tsv,./DRE/SRR1200862.island_epk.tsv,./DRE/SRR1200863.island_epk.tsv,./DRE/SRR1200866.island_epk.tsv,./DRE/SRR1200867.island_epk.tsv,./DRE/SRR1200872.island_epk.tsv,./DRE/SRR1200873.island_epk.tsv,./DRE/SRR1200874.island_epk.tsv,./DRE/SRR1200875.island_epk.tsv,./DRE/SRR1200878.island_epk.tsv,./DRE/SRR1200879.island_epk.tsv,./DRE/SRR1200882.island_epk.tsv,./DRE/SRR1200883.island_epk.tsv \
./DRE/SRR1200852.island_epk.tsv,./DRE/SRR1200853.island_epk.tsv,./DRE/SRR1200856.island_epk.tsv,./DRE/SRR1200857.island_epk.tsv,./DRE/SRR1200860.island_epk.tsv,./DRE/SRR1200861.island_epk.tsv,./DRE/SRR1200864.island_epk.tsv,./DRE/SRR1200865.island_epk.tsv,./DRE/SRR1200868.island_epk.tsv,./DRE/SRR1200869.island_epk.tsv,./DRE/SRR1200870.island_epk.tsv,./DRE/SRR1200871.island_epk.tsv,./DRE/SRR1200876.island_epk.tsv,./DRE/SRR1200877.island_epk.tsv,./DRE/SRR1200880.island_epk.tsv,./DRE/SRR1200881.island_epk.tsv,./DRE/SRR1200884.island_epk.tsv,./DRE/SRR1200885.island_epk.tsv  \
> ./DRE/diff_islands.tsv

DREtools结果展示

image.png

MET-DB交集

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,684评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,143评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,214评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,788评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,796评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,665评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,027评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,679评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 41,346评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,664评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,766评论 1 331
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,412评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,015评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,974评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,073评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,501评论 2 343

推荐阅读更多精彩内容