转录组分析实战第二节:无参考基因转录组拼接

Trinity是广泛应用的不依赖基因组的转录组分析工具

我们在这一节中将采用Trinity在服务器端对第一节中获得的cleandata进行转录组拼接

在此过程中我们会涉及软件安装、环境变量配置、转录组reads的拼接等操作

1. 首先是Trinity软件的安装

首先到本地进行下载该软件后传到服务器端
Trinity的Github仓库中下载软件就好了.
然后用scp命令放到服务器端就好了
放过去后解压进入文件夹,并开始make
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ make
#这个make依赖于cmake,如果没有cmake的就要自己安装好了放到环境变量PATH就好了
#出现一下的结果就是make好了的结果
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Performing Unit Tests of Build
 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Inchworm:                has been Installed Properly
Chrysalis:               has been Installed Properly
QuantifyGraph:           has been Installed Properly
GraphFromFasta:          has been Installed Properly
ReadsToTranscripts:      has been Installed Properly
parafly:                 has been Installed Properly
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ 

2. 开始配置perl路径

yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ export PATH=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3:$PATH
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ cd ..
yeyt@ubuntu:~/biosoft$ which Trinity 
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3/Trinity
yeyt@ubuntu:~/biosoft$ Trinity -h
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = "en_US:en",
    LC_ALL = (unset),
    LC_PAPER = "zh_CN.UTF-8",
    LC_ADDRESS = "zh_CN.UTF-8",
    LC_MONETARY = "zh_CN.UTF-8",
    LC_NUMERIC = "zh_CN.UTF-8",
    LC_TELEPHONE = "zh_CN.UTF-8",
    LC_IDENTIFICATION = "zh_CN.UTF-8",
    LC_MEASUREMENT = "zh_CN.UTF-8",
    LC_TIME = "zh_CN.UTF-8",
    LC_NAME = "zh_CN.UTF-8",
    LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").



###############################################################################
#

     ______  ____   ____  ____   ____  ______  __ __
    |      ||    \ |    ||    \ |    ||      ||  |  |
    |      ||  D  ) |  | |  _  | |  | |      ||  |  |
    |_|  |_||    /  |  | |  |  | |  | |_|  |_||  ~  |
      |  |  |    \  |  | |  |  | |  |   |  |  |___, |
      |  |  |  .  \ |  | |  |  | |  |   |  |  |     |
      |__|  |__|\_||____||__|__||____|  |__|  |____/

    Trinity-v2.8.3


#
#
# Required:
#
#  --seqType <string>      :type of reads: ('fa' or 'fq')
#
#  --max_memory <string>      :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc)
#                            provided in Gb of RAM, ie.  '--max_memory 10G'
#
#  If paired reads:
#      --left  <string>    :left reads, one or more file names (separated by commas, no spaces)
#      --right <string>    :right reads, one or more file names (separated by commas, no spaces)
#
#  Or, if unpaired reads:
#      --single <string>   :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired )
#
#  Or,
#      --samples_file <string>         tab-delimited text file indicating biological replicate relationships.
#                                   ex.
#                                        cond_A    cond_A_rep1    A_rep1_left.fq    A_rep1_right.fq
#                                        cond_A    cond_A_rep2    A_rep2_left.fq    A_rep2_right.fq
#                                        cond_B    cond_B_rep1    B_rep1_left.fq    B_rep1_right.fq
#                                        cond_B    cond_B_rep2    B_rep2_left.fq    B_rep2_right.fq
#
#                      # if single-end instead of paired-end, then leave the 4th column above empty.
#
####################################
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ make plugins
#安装插件
## Checking plugin installations:

slclust:                 has been Installed Properly
collectl:                has been Installed Properly
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ 
#这样就可以了,接下来检查依赖的软件
#bowtie2
#jellyfish
#salmon
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ which bowtie2
/opt/biosoft/bowtie2-2.2.9//bowtie2
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ which jellyfish 
/opt/biosoft/jellyfish-2.2.3/bin//jellyfish
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ which salmon 
/opt/biosoft/salmon/bin//salmon
#没有问题就继续往后面做
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ pwd
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ export TRINITY_HOME=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ echo $TRINITY_HOME 
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ echo 'export TRINITY_HOME=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3' >> ~/.bashrc
#配置环境变量
yeyt@ubuntu:~/biosoft/trinityrnaseq-Trinity-v2.8.3$ echo 'export TRINITY_HOME=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3' >> ~/.bashrc
yeyt@ubuntu:~/biosoft$ echo 'export PATH=/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3:$PATH' >> ~/.bashrc 
yeyt@ubuntu:~/biosoft$ source ~/.bashrc
yeyt@ubuntu:~/biosoft$ which Trinity 
/home/yeyt/biosoft/trinityrnaseq-Trinity-v2.8.3/Trinity
可以通过调用Trinity就达到目的了

3. 运行Trinity

我们在这个地方用 Nature Protocol 上面的方法进行处理
1) 数据的质量控制与清理(前一节已经讲了)
2)转录组数据reads的拼接

首先构建样品信息矩阵
我的样品是三个处理两个生物学重复每个重复样品2个Run因此就是这样的

yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly$ l
B251_1.P.fq.gz  B252_2.P.fq.gz  R252_1.P.fq.gz  W251_2.P.fq.gz  samples.txt
B251_2.P.fq.gz  R251_1.P.fq.gz  R252_2.P.fq.gz  W252_1.P.fq.gz
B252_1.P.fq.gz  R251_2.P.fq.gz  W251_1.P.fq.gz  W252_2.P.fq.gz
yeyt@ubuntu:~/biodata/NH160034/NH160034/cleandata/assembly$ cat samples.txt 
B25 B251    B251_1.P.fq.gz  B251_2.P.fq.gz
B25 B252    B252_1.P.fq.gz  B252_2.P.fq.gz
R25 R251    R251_1.P.fq.gz  R251_2.P.fq.gz
R25 R252    R252_1.P.fq.gz  R252_2.P.fq.gz
W25 W251    W251_1.P.fq.gz  W251_2.P.fq.gz
W25 W252    W252_1.P.fq.gz  W252_2.P.fq.gz
#然后进行运行Trinity
Trinity --seqType fq --max_memory 60G --samples_file samples.txt --CPU 6 
Trinity需要的参数

--seqType :这个参数指定数据类型 (fq or fa)
--max_memory : 这个参数指定运算过程占用内存(自己量力而行)
--samples_file samples.txt : 这个是数据的样品信息矩阵
--CPU : 这个指定运算过程使用的CPU情况 (自己量力而行)
一般情况 一个CPU配搭10G内存

另外需要指出的是由于这个运算过程需要较长的时间,因此建议用Screen工具进行托管
拼接完成后会得到一个fasta文件
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ l
RSEMout/  Salmonout/  Trinity.fasta*  Trinity.fasta.gene_trans_map*
eyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ head Trinity.fasta
>TRINITY_DN104555_c0_g1_i1 len=281 path=[0:0-280]
ACTGAGCTAAAATAAGACTTATATACGTAACTTTTTTTATTCAGTCAACATATGAAACTCAAGTTCAACCATCCAAGACATGAGCTTGTACCTTATTATGAATATTTTCTTTGGACAGAAAAAATAACACTTCAAAACCTCAACCATTTCCAAGTTTTTAGACATGCAAAAAGAGCAACATCATCCCCCCACTCTATTTGTGGAACGGTGTTCCAGATGCCTAAACTCGACATCACCCCTCCCCCACAATAAGTTCAGACTAAAAAAAGGGCAAAAAATTA
>TRINITY_DN104630_c0_g1_i1 len=376 path=[0:0-375]
TTTATGGCTGATAAATCGGCACATATGTTCGGTGCTTGTTGATTCTCCATGAGCTCGTTGAAGGGTAGAAGTTTAATTTGATTTGTTGAAGACTTGAAAATGTGGTTTTATTAAGGGTCGCATAGGCTTGATAATGATCGGGTTTGCGCCACGAGCAATTCCACGTGATGAATGTTCTCTATCTGGAAGTTGGTGAAAATGTCAGCTATTTAGCAACTTGATGACTCTTCATGTTTTGACAACTTCTAAGCTTGAAGTTCATTAGAAACTGACTATTTGTGAGCTTAGTAGTTCTTCACAAGTGTTTTTGAGACATTTGATATTTCGGAAGTAATTTGTTCTCTCTACCTCAAAGCCCCAATTTTCACTTTCTCTG
>TRINITY_DN104553_c0_g1_i1 len=222 path=[0:0-221]
TCCTTCCCAGAGAAAAACGACCCTTCATATTTGGAAGCCATCCATTACAGCATGCCGCCCTCGCTGCTACAGTTTCTTCACTGAAAGTCGTCTCCTTTTCTTTTATTTGCTCGTCTGCCTTGACCAGCCAATCAATAACTTGGCCACCAATAATCATTGAATTTTCGCTCGCACCTTTCTCCTCCAGCTCTATTCCATCTTTTTTGTGACGCAGCTGCTGAA
>TRINITY_DN104629_c0_g1_i1 len=266 path=[0:0-265]
CATTCTGGGTTTGGGGTTGAGTTATTGTGTTTATCATTAGTTATTGTGTTGATCAAATGAGTGATATATCACAATCATTGTCAAAGCTGAAGCCTTCATCATTAATCCGTTTCGGGTATTGGATTTTGTGTTTAAGGATTAAGTGGGGGTTTAAAGTTAAGGGAAATCGGTGGGAAGCTGAAGGTGTGGAAGGAAGAACAACACAAAAATGAAGGTTTGAGTTGGAGTAAAAATGTTGGAAATATTGAAACTATGGCTCCTACTCT
>TRINITY_DN104606_c0_g1_i1 len=298 path=[0:0-297]
TCAATGAAGGAATCAGTTTAATTGCTCTATGCTAGTTACACTTCAATTTTTTTGATAGAGTTAACTTATTCTAATGAATGGGTCTTATAGAGGGGAAGATTCAATTTAGGGCCAAGTATGTACCTATGTGCACTTTATGTCGTATGCCTAGTATTGTATTGTGTATTCTTATGCTTTCACTTCCATACAGTCATATTTTTTTTCTCTAAGGAATCCATCATTTTTGGCAATGCAGATTTGTATTCTTGATTATTAATAGAAAAAAAAAAAATCCTTTCTGATTGTTTCTGTTCAGATT
里面是转录本数据库,我们后期将对这个转录本数据库进行一系列的注释与归类。
在此,我们先对其进行一个初步的处理,找出这些转录本的开放阅读框(Open Reading Frame,ORF)
采用的工具是TransDecoder

下载与安装TransDecoder

下载网页

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ wget https://github.com/TransDecoder/TransDecoder/archive/TransDecoder-v5.5.0.tar.gz
--2019-02-02 16:51:50--  https://github.com/TransDecoder/TransDecoder/archive/TransDecoder-v5.5.0.tar.gz
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 302 Found
Location: https://codeload.github.com/TransDecoder/TransDecoder/tar.gz/TransDecoder-v5.5.0 [following]
--2019-02-02 16:51:51--  https://codeload.github.com/TransDecoder/TransDecoder/tar.gz/TransDecoder-v5.5.0
Connecting to 127.0.0.1:8118... connected.
Proxy request sent, awaiting response... 200 OK
Length: unspecified [application/x-gzip]
Saving to: ‘TransDecoder-v5.5.0.tar.gz’

TransDecoder-v5.5.0.tar.g     [                  <=>              ]  15.02M  3.84MB/s    in 4.5s    

2019-02-02 16:51:58 (3.33 MB/s) - ‘TransDecoder-v5.5.0.tar.gz’ saved [15748671]

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ l
RSEMout/  Salmonout/  TransDecoder-v5.5.0.tar.gz  Trinity.fasta*  Trinity.fasta.gene_trans_map*
yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ tar zxvf TransDecoder-v5.5.0.tar.gz 
TransDecoder-TransDecoder-v5.5.0/
TransDecoder-TransDecoder-v5.5.0/.gitmodules
TransDecoder-TransDecoder-v5.5.0/Changelog.txt
TransDecoder-TransDecoder-v5.5.0/LICENSE.txt
TransDecoder-TransDecoder-v5.5.0/Makefile
TransDecoder-TransDecoder-v5.5.0/PerlLib/
TransDecoder-TransDecoder-v5.5.0/PerlLib/DelimParser.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Fasta_reader.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Fasta_retriever.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/GFF3_utils2.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/GTF.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/GTF_utils2.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Gene_obj.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Longest_orf.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Nuc_translator.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Overlap_piler.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/PWM.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Pipeliner.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/Process_cmd.pm
TransDecoder-TransDecoder-v5.5.0/PerlLib/overlapping_nucs.ph
TransDecoder-TransDecoder-v5.5.0/README.md
TransDecoder-TransDecoder-v5.5.0/TransDecoder.LongOrfs
TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict
TransDecoder-TransDecoder-v5.5.0/TransDecoder.lrgTests/
TransDecoder-TransDecoder-v5.5.0/TransDecoder.wiki/
TransDecoder-TransDecoder-v5.5.0/__testing/
TransDecoder-TransDecoder-v5.5.0/__testing/Makefile
TransDecoder-TransDecoder-v5.5.0/__testing/__test.best_w_homology.expected
TransDecoder-TransDecoder-v5.5.0/__testing/__test.simplest.expected
TransDecoder-TransDecoder-v5.5.0/__testing/__test.single_best.expected
TransDecoder-TransDecoder-v5.5.0/__testing/__test.wBlastNPfam.expected
TransDecoder-TransDecoder-v5.5.0/__testing/blastp.outfmt6
TransDecoder-TransDecoder-v5.5.0/__testing/longest_orfs.cds.scores
TransDecoder-TransDecoder-v5.5.0/__testing/longest_orfs.gff3
TransDecoder-TransDecoder-v5.5.0/__testing/pfam.domtblout
TransDecoder-TransDecoder-v5.5.0/notes
TransDecoder-TransDecoder-v5.5.0/sample_data/
TransDecoder-TransDecoder-v5.5.0/sample_data/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/README
TransDecoder-TransDecoder-v5.5.0/sample_data/README.md
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/mini_Pfam-A.hmm.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/mini_sprot.db.pep.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/test.genome.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/test.tophat.sam.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/cufflinks_example/transcripts.gtf.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/genome.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/pasa_assemblies.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/pasa_assemblies.gff3.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/pasa_assemblies_described.txt.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/pasa_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/Trinity.fasta.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/genome_alignments.gmap.gff3.gz
TransDecoder-TransDecoder-v5.5.0/sample_data/simple_transcriptome_target/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/stringtie_merged.gtf
TransDecoder-TransDecoder-v5.5.0/sample_data/stringtie_example/stringtie_merged.transcripts.fasta
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/Makefile
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/cleanme.pl
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/runMe.sh
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/supertranscripts.fasta
TransDecoder-TransDecoder-v5.5.0/sample_data/supertranscripts_example/supertranscripts.gtf
TransDecoder-TransDecoder-v5.5.0/util/
TransDecoder-TransDecoder-v5.5.0/util/PWM/
TransDecoder-TransDecoder-v5.5.0/util/PWM/README.md
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/build_atgPWM.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/build_pwm.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/__deprecated/score_atgPWM.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/build_atgPWM_+-.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/compute_AUC.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/deplete_feature_noise.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/feature_scores_to_ROC.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/feature_scoring.+-.pl
TransDecoder-TransDecoder-v5.5.0/util/PWM/make_seqLogo.Rscript
TransDecoder-TransDecoder-v5.5.0/util/PWM/plot_ROC.Rscript
TransDecoder-TransDecoder-v5.5.0/util/PWM/simulate_feature_seq_from_PWM.pl
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/cleanMe.sh
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/longest_orfs.cds.top_longest_5000.nr80.gz
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/pasa_assemblies.fasta.gz
TransDecoder-TransDecoder-v5.5.0/util/__pwm_tests/runMe.sh
TransDecoder-TransDecoder-v5.5.0/util/bin/
TransDecoder-TransDecoder-v5.5.0/util/bin/.hidden
TransDecoder-TransDecoder-v5.5.0/util/cdna_alignment_orf_to_genome_orf.pl
TransDecoder-TransDecoder-v5.5.0/util/compute_base_probs.pl
TransDecoder-TransDecoder-v5.5.0/util/exclude_similar_proteins.pl
TransDecoder-TransDecoder-v5.5.0/util/fasta_prot_checker.pl
TransDecoder-TransDecoder-v5.5.0/util/ffindex_resume.pl
TransDecoder-TransDecoder-v5.5.0/util/gene_list_to_gff.pl
TransDecoder-TransDecoder-v5.5.0/util/get_FL_accs.pl
TransDecoder-TransDecoder-v5.5.0/util/get_longest_ORF_per_transcript.pl
TransDecoder-TransDecoder-v5.5.0/util/get_top_longest_fasta_entries.pl
TransDecoder-TransDecoder-v5.5.0/util/gff3_file_to_bed.pl
TransDecoder-TransDecoder-v5.5.0/util/gff3_file_to_proteins.pl
TransDecoder-TransDecoder-v5.5.0/util/gff3_gene_to_gtf_format.pl
TransDecoder-TransDecoder-v5.5.0/util/gtf_genome_to_cdna_fasta.pl
TransDecoder-TransDecoder-v5.5.0/util/gtf_to_alignment_gff3.pl
TransDecoder-TransDecoder-v5.5.0/util/gtf_to_bed.pl
TransDecoder-TransDecoder-v5.5.0/util/misc/
TransDecoder-TransDecoder-v5.5.0/util/misc/__init__.py
TransDecoder-TransDecoder-v5.5.0/util/misc/get_FP_FN_scores.py
TransDecoder-TransDecoder-v5.5.0/util/misc/plot_indiv_seq_likelihood_profile.py
TransDecoder-TransDecoder-v5.5.0/util/misc/rpart_scores.Rscript
TransDecoder-TransDecoder-v5.5.0/util/misc/select_TD_orfs.py
TransDecoder-TransDecoder-v5.5.0/util/nr_ORFs_gff3.pl
TransDecoder-TransDecoder-v5.5.0/util/pfam_mpi.pbs
TransDecoder-TransDecoder-v5.5.0/util/pfam_runner.pl
TransDecoder-TransDecoder-v5.5.0/util/refine_gff3_group_iso_strip_utrs.pl
TransDecoder-TransDecoder-v5.5.0/util/refine_hexamer_scores.pl
TransDecoder-TransDecoder-v5.5.0/util/remove_eclipsed_ORFs.pl
TransDecoder-TransDecoder-v5.5.0/util/score_CDS_likelihood_all_6_frames.pl
TransDecoder-TransDecoder-v5.5.0/util/select_best_ORFs_per_transcript.pl
TransDecoder-TransDecoder-v5.5.0/util/seq_n_baseprobs_to_loglikelihood_vals.pl
TransDecoder-TransDecoder-v5.5.0/util/start_codon_refinement.pl
TransDecoder-TransDecoder-v5.5.0/util/train_start_PWM.pl
TransDecoder-TransDecoder-v5.5.0/util/uri_unescape.pl

运行

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ ./TransDecoder-TransDecoder-v5.5.0/TransDecoder.LongOrfs -t Trinity.fasta
* Running CMD: /home/yeyuntian/Biodata/trinitytest/downstr/TransDecoder-TransDecoder-v5.5.0/util/compute_base_probs.pl Trinity.fasta 0 > /home/yeyuntian/Biodata/trinitytest/downstr/Trinity.fasta.transdecoder_dir/base_freqs.dat


-first extracting base frequencies, we'll need them later.


- extracting ORFs from transcripts.
-total transcripts to examine: 220498
[220400/220498] = 99.96% done    CMD: touch /home/yeyuntian/Biodata/trinitytest/downstr/Trinity.fasta.transdecoder_dir.__checkpoints_longorfs/TD.longorfs.ok


#################################
### Done preparing long ORFs.  ###
##################################

    Use file: /home/yeyuntian/Biodata/trinitytest/downstr/Trinity.fasta.transdecoder_dir/longest_orfs.pep  for Pfam and/or BlastP searches to enable homology-based coding region identification.

    Then, run TransDecoder.Predict for your final coding region predictions.

生成了一个文件夹

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ l -alt
total 264404
drwxrwxr-x  2 yeyuntian yeyuntian      4096 2月   2 17:15 Trinity.fasta.transdecoder_dir.__checkpoints_longorfs/
drwxrwxr-x  2 yeyuntian yeyuntian      4096 2月   2 17:09 Trinity.fasta.transdecoder_dir/
drwxrwxr-x  7 yeyuntian yeyuntian      4096 2月   2 17:07 ./
-rw-rw-r--  1 yeyuntian yeyuntian       212 2月   2 17:07 pipeliner.4094.cmds
-rw-rw-r--  1 yeyuntian yeyuntian  15748671 2月   2 16:51 TransDecoder-v5.5.0.tar.gz
drwxrwxr-x  8 yeyuntian yeyuntian      4096 10月 22 20:45 TransDecoder-TransDecoder-v5.5.0/

然后开始继续执行下一个命令

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ ./TransDecoder-TransDecoder-v5.5.0/TransDecoder.Predict -t Trinity.fasta 

会有报错说seqLogo不存在,因为这个命令会调用一个R包可以在Bioconductor来进行安装好。

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("seqLogo", version = "3.8")

最后我们可以看到通过这个软件生成的几个数据:

yeyuntian@yeyuntian-RESCUER-R720-15IKBN:~/Biodata/trinitytest/downstr$ ll -alt 
total 558620
-rw-rw-r--  1 yeyuntian yeyuntian      3016 2月   2 18:17 pipeliner.7404.cmds
drwxrwxr-x  8 yeyuntian yeyuntian      4096 2月   2 18:17 ./
drwxrwxr-x  2 yeyuntian yeyuntian      4096 2月   2 17:43 Trinity.fasta.transdecoder_dir.__checkpoints/
-rw-rw-r--  1 yeyuntian yeyuntian 133585627 2月   2 17:43 Trinity.fasta.transdecoder.cds
-rw-rw-r--  1 yeyuntian yeyuntian      3016 2月   2 17:43 pipeliner.5796.cmds
-rw-rw-r--  1 yeyuntian yeyuntian  56068696 2月   2 17:43 Trinity.fasta.transdecoder.pep
-rw-rw-r--  1 yeyuntian yeyuntian  19376844 2月   2 17:42 Trinity.fasta.transdecoder.bed
-rw-rw-r--  1 yeyuntian yeyuntian  92216622 2月   2 17:42 Trinity.fasta.transdecoder.gff3

其中
.pep (是最终的候选ORF编码的蛋白序列)
.cds (是编码蛋白的核酸序列)
.gff3 (是表示ORF和转录本的位置关系)
.bed (用于后期的IGV可视化)

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 205,033评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,725评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,473评论 0 338
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,846评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,848评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,691评论 1 282
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,053评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,700评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 42,856评论 1 300
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,676评论 2 323
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,787评论 1 333
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,430评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,034评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,990评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,218评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,174评论 2 352
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,526评论 2 343

推荐阅读更多精彩内容