一直运行很正常的Trinity,突然出现了从未有过的报错,并且是,有的样本报错无法运行,另一些样本可以正常运行。
完整报错信息如下:
Converting input files. (in parallel)Tuesday, April 4, 2023: 10:59:14 CMD: gunzip -c /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_1.clean.fq.gz | fastool --illumina-trinity --to-fasta >> left.fa 2> /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_1.clean.fq.gz.readcount
Tuesday, April 4, 2023: 10:59:14 CMD: gunzip -c /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_2.clean.fq.gz | fastool --illumina-trinity --to-fasta >> right.fa 2> /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_2.clean.fq.gz.readcount
Thread 1 terminated abnormally: Error, cmd: gunzip -c /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_1.clean.fq.gz | fastool --illumina-trinity --to-fasta >> left.fa 2> /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_1.clean.fq.gz.readcount died with ret 256 at /home/jjp/Software/miniconda3/bin/Trinity line 2183.
Thread 2 terminated abnormally: Error, counts of reads in FQ: 20882642 (as per gunzip -c /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_2.clean.fq.gz | wc -l) doesn't match fastool's report of FA records: 4497094 at /home/jjp/Software/miniconda3/bin/Trinity line 3060 thread 2.
main::ensure_complete_FQtoFA_conversion("gunzip -c /home/jjp/Project/trans_119/test/Unknown_BD459-02T0"..., "/home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_2.clea"...) called at /home/jjp/Software/miniconda3/bin/Trinity line 2099 thread 2
main::prep_seqs(ARRAY(0x55898fddbc28), "fq", "right", undef) called at /home/jjp/Software/miniconda3/bin/Trinity line 1313 thread 2
eval {...} called at /home/jjp/Software/miniconda3/bin/Trinity line 1313 thread 2
Trinity run failed. Must investigate error above.
乍一看,信息是再gunzip 后接 fastool这一步出的问题。
那么查看两个命令,gunzip 和 fastool,都可以正常运行。那么不是这两个软件的调用问题。
考虑到有的文件可以成功,有的则运行失败,可能是文件本身的问题。所以检查了所有数据的MD5值。检查完后也没有啥问题。
然后看命令,错误在trinity执行以下两行命令:
gunzip -c /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_2.clean.fq.gz | fastool --illumina-trinity --to-fasta >> right.fa 2> /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_2.clean.fq.gz.readcount
gunzip -c /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_1.clean.fq.gz | fastool --illumina-trinity --to-fasta >> left.fa 2> /home/jjp/Project/trans_119/test/Unknown_BD459-02T0001_1.clean.fq.gz.readcount
那么,单独执行这两行命令,发现是可以运行成功的。运行成功后,在此执行trinity脚本。
----------------------------------------------------------------------------------
-------------- Trinity Phase 1: Clustering of RNA-Seq Reads ---------------------
----------------------------------------------------------------------------------
Converting input files. (in parallel)Tuesday, April 4, 2023: 11:05:31 CMD: touch left.fa.ok right.fa.ok
Tuesday, April 4, 2023: 11:05:31 CMD: cat left.fa right.fa > both.fa
Tuesday, April 4, 2023: 11:05:32 CMD: touch both.fa.ok
-------------------------------------------
----------- Jellyfish --------------------
-- (building a k-mer catalog from reads) --
-------------------------------------------
* Running CMD: jellyfish count -t 40 -m 25 -s 61096915194 --canonical both.fa
* Running CMD: jellyfish dump -L 1 mer_counts.jf > jellyfish.kmers.fa
* Running CMD: jellyfish histo -t 40 -o jellyfish.kmers.fa.histo mer_counts.jf
可以看到,运行正常,且在 Trinity Phase 1 中直接跳过了这一步,直接进入 CMD: cat left.fa right.fa > both.fa这一步。
这样问题就基本解决了。总结一下,问题在于trinity调用gunzip及fastool时出现错误,所以事先手动完成这一步,生成left.fa和right.fa,并提前建立默认的trinity_out_dir文件夹,并将这两个文件放进去。(或者建立单独的其它名字文件夹)
最终代码
for fn in *_1.clean.fq.gz
do
sample=${fn%_1.clean*}
left_all=${sample}_1.clean.fq.gz
right_all=${sample}_2.clean.fq.gz
mkdir ${sample}_trinity
gunzip -c ./${sample}_1.clean.fq.gz | fastool --illumina-trinity --to-fasta >> ./${sample}_trinity/left.fa 2> ./${sample}_1.clean.fq.gz.readcount
gunzip -c ./${sample}_2.clean.fq.gz | fastool --illumina-trinity --to-fasta >> ./${sample}_trinity/right.fa 2> ./${sample}_2.clean.fq.gz.readcount
Trinity \
--seqType fq \
--max_memory ${Mem}G \
--left $left_all \
--right $right_all \
--CPU $thread \
--output ${sample}_trinity
done