Trinity是广泛应用的不依赖基因组的转录组分析工具
我们在这一节中将采用Trinity在服务器端对第一节中获得的cleandata进行转录组拼接
在此过程中我们会涉及软件安装、环境变量配置、转录组reads的拼接等操作
1. 首先是Trinity软件的安装
首先到本地进行下载该软件后传到服务器端
到Trinity的Github仓库中下载软件就好了.
然后用scp命令放到服务器端就好了
放过去后解压进入文件夹,并开始make
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ make
#这个make依赖于cmake,如果没有cmake的就要自己安装好了放到环境变量PATH就好了
#出现一下的结果就是make好了的结果
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Performing Unit Tests of Build
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Inchworm: has been Installed Properly
Chrysalis: has been Installed Properly
QuantifyGraph: has been Installed Properly
GraphFromFasta: has been Installed Properly
ReadsToTranscripts: has been Installed Properly
parafly: has been Installed Properly
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$
2. 开始配置perl路径
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ export PATH=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3:$PATH
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ cd ..
yeyt@ubuntu:~/biosoft$ which Trinity
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3/Trinity
yeyt@ubuntu:~/biosoft$ Trinity -h
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US:en",
LC_ALL = (unset),
LC_PAPER = "zh_CN.UTF-8",
LC_ADDRESS = "zh_CN.UTF-8",
LC_MONETARY = "zh_CN.UTF-8",
LC_NUMERIC = "zh_CN.UTF-8",
LC_TELEPHONE = "zh_CN.UTF-8",
LC_IDENTIFICATION = "zh_CN.UTF-8",
LC_MEASUREMENT = "zh_CN.UTF-8",
LC_TIME = "zh_CN.UTF-8",
LC_NAME = "zh_CN.UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
###############################################################################
#
______ ____ ____ ____ ____ ______ __ __
| || \ | || \ | || || | |
| || D ) | | | _ | | | | || | |
|_| |_|| / | | | | | | | |_| |_|| ~ |
| | | \ | | | | | | | | | |___, |
| | | . \ | | | | | | | | | | |
|__| |__|\_||____||__|__||____| |__| |____/
Trinity-v2.8.3
#
#
# Required:
#
# --seqType <string> :type of reads: ('fa' or 'fq')
#
# --max_memory <string> :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc)
# provided in Gb of RAM, ie. '--max_memory 10G'
#
# If paired reads:
# --left <string> :left reads, one or more file names (separated by commas, no spaces)
# --right <string> :right reads, one or more file names (separated by commas, no spaces)
#
# Or, if unpaired reads:
# --single <string> :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired )
#
# Or,
# --samples_file <string> tab-delimited text file indicating biological replicate relationships.
# ex.
# cond_A cond_A_rep1 A_rep1_left.fq A_rep1_right.fq
# cond_A cond_A_rep2 A_rep2_left.fq A_rep2_right.fq
# cond_B cond_B_rep1 B_rep1_left.fq B_rep1_right.fq
# cond_B cond_B_rep2 B_rep2_left.fq B_rep2_right.fq
#
# # if single-end instead of paired-end, then leave the 4th column above empty.
#
####################################
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ make plugins
#安装插件
## Checking plugin installations:
slclust: has been Installed Properly
collectl: has been Installed Properly
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$
#这样就可以了,接下来检查依赖的软件
#bowtie2
#jellyfish
#salmon
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ which bowtie2
/opt/biosoft/bowtie2-2.2.9//bowtie2
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ which jellyfish
/opt/biosoft/jellyfish-2.2.3/bin//jellyfish
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ which salmon
/opt/biosoft/salmon/bin//salmon
#没有问题就继续往后面做
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ pwd
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ export TRINITY_HOME=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ echo $TRINITY_HOME
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ echo 'export TRINITY_HOME=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3' >> ~/.bashrc
#配置环境变量
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ echo 'export TRINITY_HOME=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3' >> ~/.bashrc
yeyt@ubuntu:~/biosoft$ echo 'export PATH=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3:$PATH' >> ~/.bashrc
yeyt@ubuntu:~/biosoft$ source ~/.bashrc
yeyt@ubuntu:~/biosoft$ which Trinity
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3/Trinity
可以通过调用Trinity就达到目的了
3. 运行Trinity
我们在这个地方用 Nature Protocol 上面的方法进行处理
1) 数据的质量控制与清理(前一节已经讲了)
2)转录组数据reads的拼接
首先构建样品信息矩阵
我的样品是三个处理两个生物学重复每个重复样品2个Run因此就是这样的
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly$ l
B251_1.P.fq.gz B252_2.P.fq.gz R252_1.P.fq.gz W251_2.P.fq.gz samples.txt
B251_2.P.fq.gz R251_1.P.fq.gz R252_2.P.fq.gz W252_1.P.fq.gz
B252_1.P.fq.gz R251_2.P.fq.gz W251_1.P.fq.gz W252_2.P.fq.gz
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly$ cat samples.txt
B25 B251 B251_1.P.fq.gz B251_2.P.fq.gz
B25 B252 B252_1.P.fq.gz B252_2.P.fq.gz
R25 R251 R251_1.P.fq.gz R251_2.P.fq.gz
R25 R252 R252_1.P.fq.gz R252_2.P.fq.gz
W25 W251 W251_1.P.fq.gz W251_2.P.fq.gz
W25 W252 W252_1.P.fq.gz W252_2.P.fq.gz
#然后进行运行Trinity
Trinity --seqType fq --max_memory 60G --samples_file samples.txt --CPU 6
Trinity需要的参数
--seqType :这个参数指定数据类型 (fq or fa)
--max_memory : 这个参数指定运算过程占用内存(自己量力而行)
--samples_file samples.txt : 这个是数据的样品信息矩阵
--CPU : 这个指定运算过程使用的CPU情况 (自己量力而行)
一般情况 一个CPU配搭10G内存
另外需要指出的是由于这个运算过程需要较长的时间,因此建议用Screen工具进行托管
拼接完成后会得到一个fasta文件
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ l
RSEMout/ Salmonout/ Trinity.fasta* Trinity.fasta.gene_trans_map*
eyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ head Trinity.fasta
>TRINITY_DN104555_c0_g1_i1 len=281 path=[0:0-280]
ACTGAGCTAAAATAAGACTTATATACGTAACTTTTTTTATTCAGTCAACATATGAAACTCAAGTTCAACCATCCAAGACATGAGCTTGTACCTTATTATGAATATTTTCTTTGGACAGAAAAAATAACACTTCAAAACCTCAACCATTTCCAAGTTTTTAGACATGCAAAAAGAGCAACATCATCCCCCCACTCTATTTGTGGAACGGTGTTCCAGATGCCTAAACTCGACATCACCCCTCCCCCACAATAAGTTCAGACTAAAAAAAGGGCAAAAAATTA
>TRINITY_DN104630_c0_g1_i1 len=376 path=[0:0-375]
TTTATGGCTGATAAATCGGCACATATGTTCGGTGCTTGTTGATTCTCCATGAGCTCGTTGAAGGGTAGAAGTTTAATTTGATTTGTTGAAGACTTGAAAATGTGGTTTTATTAAGGGTCGCATAGGCTTGATAATGATCGGGTTTGCGCCACGAGCAATTCCACGTGATGAATGTTCTCTATCTGGAAGTTGGTGAAAATGTCAGCTATTTAGCAACTTGATGACTCTTCATGTTTTGACAACTTCTAAGCTTGAAGTTCATTAGAAACTGACTATTTGTGAGCTTAGTAGTTCTTCACAAGTGTTTTTGAGACATTTGATATTTCGGAAGTAATTTGTTCTCTCTACCTCAAAGCCCCAATTTTCACTTTCTCTG
>TRINITY_DN104553_c0_g1_i1 len=222 path=[0:0-221]
TCCTTCCCAGAGAAAAACGACCCTTCATATTTGGAAGCCATCCATTACAGCATGCCGCCCTCGCTGCTACAGTTTCTTCACTGAAAGTCGTCTCCTTTTCTTTTATTTGCTCGTCTGCCTTGACCAGCCAATCAATAACTTGGCCACCAATAATCATTGAATTTTCGCTCGCACCTTTCTCCTCCAGCTCTATTCCATCTTTTTTGTGACGCAGCTGCTGAA
>TRINITY_DN104629_c0_g1_i1 len=266 path=[0:0-265]
CATTCTGGGTTTGGGGTTGAGTTATTGTGTTTATCATTAGTTATTGTGTTGATCAAATGAGTGATATATCACAATCATTGTCAAAGCTGAAGCCTTCATCATTAATCCGTTTCGGGTATTGGATTTTGTGTTTAAGGATTAAGTGGGGGTTTAAAGTTAAGGGAAATCGGTGGGAAGCTGAAGGTGTGGAAGGAAGAACAACACAAAAATGAAGGTTTGAGTTGGAGTAAAAATGTTGGAAATATTGAAACTATGGCTCCTACTCT
>TRINITY_DN104606_c0_g1_i1 len=298 path=[0:0-297]
TCAATGAAGGAATCAGTTTAATTGCTCTATGCTAGTTACACTTCAATTTTTTTGATAGAGTTAACTTATTCTAATGAATGGGTCTTATAGAGGGGAAGATTCAATTTAGGGCCAAGTATGTACCTATGTGCACTTTATGTCGTATGCCTAGTATTGTATTGTGTATTCTTATGCTTTCACTTCCATACAGTCATATTTTTTTTCTCTAAGGAATCCATCATTTTTGGCAATGCAGATTTGTATTCTTGATTATTAATAGAAAAAAAAAAAATCCTTTCTGATTGTTTCTGTTCAGATT
里面是转录本数据库,我们后期将对这个转录本数据库进行一系列的注释与归类。
在此,我们先对其进行一个初步的处理,找出这些转录本的开放阅读框(Open Reading Frame,ORF)
采用的工具是TransDecoder
下载与安装TransDecoder
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ wget https://github.com/TransDecoder/TransDecoder/archive/TransDecoder-v5.5.0.tar.gz
--2019-02-02 16:51:50-- https://github.com/TransDecoder/TransDecoder/archive/TransDecoder-v5.5.0.tar.gz
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://codeload.github.com/TransDecoder/TransDecoder/tar.gz/TransDecoder-v5.5.0 [following]
--2019-02-02 16:51:51-- https://codeload.github.com/TransDecoder/TransDecoder/tar.gz/TransDecoder-v5.5.0
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘TransDecoder-v5.5.0.tar.gz’
TransDecoder-v5.5.0.tar.g [ <=> ] 15.02M 3.84MB/s in 4.5s
2019-02-02 16:51:58 (3.33 MB/s) - ‘TransDecoder-v5.5.0.tar.gz’ saved [15748671]
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ l
RSEMout/ Salmonout/ TransDecoder-v5.5.0.tar.gz Trinity.fasta* Trinity.fasta.gene_trans_map*
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ tar zxvf TransDecoder-v5.5.0.tar.gz
TransDecoder-TransDecoder-v5.5.0/
TransDecoder-TransDecoder-v5.5.0/.gitmodules
TransDecoder-TransDecoder-v5.5.0/Changelog.txt
TransDecoder-TransDecoder-v5.5.0/LICENSE.txt
TransDecoder-TransDecoder-v5.5.0/Makefile
TransDecoder-TransDecoder-v5.5.0/PerlLib/
TransDecoder-TransDecoder-v5.5.0/PerlLib/DelimParser.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Fasta_reader.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Fasta_retriever.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/GFF3_utils2.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/GTF.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/GTF_utils2.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Gene_obj.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Longest_orf.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Nuc_translator.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Overlap_piler.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/PWM.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Pipeliner.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Process_cmd.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/overlapping_nucs.ph
TransDecoder-TransDecoder-v5.5.0/README.md
TransDecoder-TransDecoder-v5.5.0/TransDecoder.LongOrfs
TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict
TransDecoder-TransDecoder-v5.5.0/TransDecoder.lrgTests/
TransDecoder-TransDecoder-v5.5.0/TransDecoder.wiki/
TransDecoder-TransDecoder-v5.5.0/__testing/
TransDecoder-TransDecoder-v5.5.0/__testing/Makefile
TransDecoder-TransDecoder-v5.5.0/__testing/__test.best_w_homology.expected
TransDecoder-TransDecoder-v5.5.0/__testing/__test.simplest.expected
TransDecoder-TransDecoder-v5.5.0/__testing/__test.single_best.expected
TransDecoder-TransDecoder-v5.5.0/__testing/__test.wBlastNPfam.expected
TransDecoder-TransDecoder-v5.5.0/__testing/blastp.outfmt6
TransDecoder-TransDecoder-v5.5.0/__testing/longest_orfs.cds.scores
TransDecoder-TransDecoder-v5.5.0/__testing/longest_orfs.gff3
TransDecoder-TransDecoder-v5.5.0/__testing/pfam.domtblout
TransDecoder-TransDecoder-v5.5.0/notes
TransDecoder-TransDecoder-v5.5.0/sample_data/
TransDecoder-TransDecoder-v5.5.0/sample_data/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/README
TransDecoder-TransDecoder-v5.5.0/sample_data/README.md
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/mini_Pfam-A.hmm.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/mini_sprot.db.pep.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/test.genome.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/test.tophat.sam.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/transcripts.gtf.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/genome.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/pasa_assemblies.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/pasa_assemblies.gff3.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/pasa_assemblies_described.txt.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/Trinity.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/genome_alignments.gmap.gff3.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/stringtie_merged.gtf
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/stringtie_merged.transcripts.fasta
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/supertranscripts.fasta
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/supertranscripts.gtf
TransDecoder-TransDecoder-v5.5.0/util/
TransDecoder-TransDecoder-v5.5.0/util/PWM/
TransDecoder-TransDecoder-v5.5.0/util/PWM/README.md
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/build_atgPWM.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/build_pwm.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/score_atgPWM.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/build_atgPWM_+-.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/compute_AUC.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/deplete_feature_noise.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/feature_scores_to_ROC.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/feature_scoring.+-.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/make_seqLogo.Rscript
TransDecoder-TransDecoder-v5.5.0/util/PWM/plot_ROC.Rscript
TransDecoder-TransDecoder-v5.5.0/util/PWM/simulate_feature_seq_from_PWM.pl
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/cleanMe.sh
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/longest_orfs.cds.top_longest_5000.nr80.gz
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/pasa_assemblies.fasta.gz
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/runMe.sh
TransDecoder-TransDecoder-v5.5.0/util/bin/
TransDecoder-TransDecoder-v5.5.0/util/bin/.hidden
TransDecoder-TransDecoder-v5.5.0/util/cdna_alignment_orf_to_genome_orf.pl
TransDecoder-TransDecoder-v5.5.0/util/compute_base_probs.pl
TransDecoder-TransDecoder-v5.5.0/util/exclude_similar_proteins.pl
TransDecoder-TransDecoder-v5.5.0/util/fasta_prot_checker.pl
TransDecoder-TransDecoder-v5.5.0/util/ffindex_resume.pl
TransDecoder-TransDecoder-v5.5.0/util/gene_list_to_gff.pl
TransDecoder-TransDecoder-v5.5.0/util/get_FL_accs.pl
TransDecoder-TransDecoder-v5.5.0/util/get_longest_ORF_per_transcript.pl
TransDecoder-TransDecoder-v5.5.0/util/get_top_longest_fasta_entries.pl
TransDecoder-TransDecoder-v5.5.0/util/gff3_file_to_bed.pl
TransDecoder-TransDecoder-v5.5.0/util/gff3_file_to_proteins.pl
TransDecoder-TransDecoder-v5.5.0/util/gff3_gene_to_gtf_format.pl
TransDecoder-TransDecoder-v5.5.0/util/gtf_genome_to_cdna_fasta.pl
TransDecoder-TransDecoder-v5.5.0/util/gtf_to_alignment_gff3.pl
TransDecoder-TransDecoder-v5.5.0/util/gtf_to_bed.pl
TransDecoder-TransDecoder-v5.5.0/util/misc/
TransDecoder-TransDecoder-v5.5.0/util/misc/__init__.py
TransDecoder-TransDecoder-v5.5.0/util/misc/get_FP_FN_scores.py
TransDecoder-TransDecoder-v5.5.0/util/misc/plot_indiv_seq_likelihood_profile.py
TransDecoder-TransDecoder-v5.5.0/util/misc/rpart_scores.Rscript
TransDecoder-TransDecoder-v5.5.0/util/misc/select_TD_orfs.py
TransDecoder-TransDecoder-v5.5.0/util/nr_ORFs_gff3.pl
TransDecoder-TransDecoder-v5.5.0/util/pfam_mpi.pbs
TransDecoder-TransDecoder-v5.5.0/util/pfam_runner.pl
TransDecoder-TransDecoder-v5.5.0/util/refine_gff3_group_iso_strip_utrs.pl
TransDecoder-TransDecoder-v5.5.0/util/refine_hexamer_scores.pl
TransDecoder-TransDecoder-v5.5.0/util/remove_eclipsed_ORFs.pl
TransDecoder-TransDecoder-v5.5.0/util/score_CDS_likelihood_all_6_frames.pl
TransDecoder-TransDecoder-v5.5.0/util/select_best_ORFs_per_transcript.pl
TransDecoder-TransDecoder-v5.5.0/util/seq_n_baseprobs_to_loglikelihood_vals.pl
TransDecoder-TransDecoder-v5.5.0/util/start_codon_refinement.pl
TransDecoder-TransDecoder-v5.5.0/util/train_start_PWM.pl
TransDecoder-TransDecoder-v5.5.0/util/uri_unescape.pl
运行
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ ./TransDecoder-TransDecoder-v5.5.0/TransDecoder.LongOrfs -t Trinity.fasta
* Running CMD: /home/yeyuntian/Biodata/trinitytest/downstr/TransDecoder-TransDecoder-v5.5.0/util/compute_base_probs.pl Trinity.fasta 0 > /home/yeyuntian/Biodata/trinitytest/downstr/Trinity.fasta.transdecoder_dir/base_freqs.dat
-first extracting base frequencies, we'll need them later.
- extracting ORFs from transcripts.
-total transcripts to examine: 220498
[220400/220498] = 99.96% done CMD: touch /home/yeyuntian/Biodata/trinitytest/downstr/Trinity.fasta.transdecoder_dir.__checkpoints_longorfs/TD.longorfs.ok
#################################
### Done preparing long ORFs. ###
##################################
Use file: /home/yeyuntian/Biodata/trinitytest/downstr/Trinity.fasta.transdecoder_dir/longest_orfs.pep for Pfam and/or BlastP searches to enable homology-based coding region identification.
Then, run TransDecoder.Predict for your final coding region predictions.
生成了一个文件夹
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ l -alt
total 264404
drwxrwxr-x 2 yeyuntian yeyuntian 4096 2月 2 17:15 Trinity.fasta.transdecoder_dir.__checkpoints_longorfs/
drwxrwxr-x 2 yeyuntian yeyuntian 4096 2月 2 17:09 Trinity.fasta.transdecoder_dir/
drwxrwxr-x 7 yeyuntian yeyuntian 4096 2月 2 17:07 ./
-rw-rw-r-- 1 yeyuntian yeyuntian 212 2月 2 17:07 pipeliner.4094.cmds
-rw-rw-r-- 1 yeyuntian yeyuntian 15748671 2月 2 16:51 TransDecoder-v5.5.0.tar.gz
drwxrwxr-x 8 yeyuntian yeyuntian 4096 10月 22 20:45 TransDecoder-TransDecoder-v5.5.0/
然后开始继续执行下一个命令
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ ./TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t Trinity.fasta
会有报错说seqLogo不存在,因为这个命令会调用一个R包可以在Bioconductor来进行安装好。
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("seqLogo", version = "3.8")
最后我们可以看到通过这个软件生成的几个数据:
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ ll -alt
total 558620
-rw-rw-r-- 1 yeyuntian yeyuntian 3016 2月 2 18:17 pipeliner.7404.cmds
drwxrwxr-x 8 yeyuntian yeyuntian 4096 2月 2 18:17 ./
drwxrwxr-x 2 yeyuntian yeyuntian 4096 2月 2 17:43 Trinity.fasta.transdecoder_dir.__checkpoints/
-rw-rw-r-- 1 yeyuntian yeyuntian 133585627 2月 2 17:43 Trinity.fasta.transdecoder.cds
-rw-rw-r-- 1 yeyuntian yeyuntian 3016 2月 2 17:43 pipeliner.5796.cmds
-rw-rw-r-- 1 yeyuntian yeyuntian 56068696 2月 2 17:43 Trinity.fasta.transdecoder.pep
-rw-rw-r-- 1 yeyuntian yeyuntian 19376844 2月 2 17:42 Trinity.fasta.transdecoder.bed
-rw-rw-r-- 1 yeyuntian yeyuntian 92216622 2月 2 17:42 Trinity.fasta.transdecoder.gff3
其中
.pep (是最终的候选ORF编码的蛋白序列)
.cds (是编码蛋白的核酸序列)
.gff3 (是表示ORF和转录本的位置关系)
.bed (用于后期的IGV可视化)