Step 1, 使用序列比对工具进行序列比对,这里笔者用的mafft(官网说明:https://mafft.cbrc.jp/alignment/software/)
官方:mafft--autoinput>output笔者操作:mafft--autoAth.direct+inverted_ID.cds.229.fasta>Ath.direct+inverted_ID.cds.229.aln.fasta
以下为mafft命令终端输出结果:nthread=0nthreadpair=0nthreadtb=0ppenalty_ex=0stacksize:8192 kbgeneratinga scoring matrix for nucleotide (dist=200) ... doneGapPenalty = -1.53, +0.00, +0.00Makinga distance matrix ..Thereare 1 ambiguous characters.201/ 229done.Constructinga UPGMA tree (efffree=0) ... 220/ 229done.Progressivealignment 1/2... STEP129 / 228 fReallocating..done.*alloclen = 23649STEP176 / 228 fReallocating..done.*alloclen = 26535STEP226 / 228 fReallocating..done.*alloclen = 27829STEP228 / 228 fdone.Makinga distance matrix from msa.. 200/ 229done.Constructinga UPGMA tree (efffree=1) ... 220/ 229done.Progressivealignment 2/2... STEP209 / 228 fReallocating..done.*alloclen = 27476STEP228 / 228 fdone.disttbfast(nuc) Version 7.471alg=A, model=DNA200 (2), 1.53 (4.59), -0.00 (-0.00), noshift, amax=0.00thread(s)Strategy:FFT-NS-2(Fast but rough)Progressivemethod (guide trees were built 2 times.)Ifunsure which option to use, try 'mafft --auto input > output'.Formore information, see 'mafft --help', 'mafft --man' and the mafft page.Thedefault gap scoring scheme has been changed in version 7.110 (2013 Oct).Ittends to insert more gaps into gap-rich regions than previous versions.To disable this change, add the --leavegappyregion option.
以上的输出信息要记住mafft的版本为7.471,对于比对策略,程序选择了FFT-NS-2。(目的是方便最后写文章时对材料方法的描述)。
# 使用--auto为程序自动选择比对策略,默认比对结果格式为fasta格式。
如果输出clustal格式即.aln的比对文件,用下面的命令
mafft --clustalout input.fasta > input.out
Step 2, 接下来基于序列比对文件使用FastTree构建ML系统发育树。(FastTree官网:http://www.microbesonline.org/fasttree/#Install)
下载即安装
运行
FastTree-gtr-nt-gammaalignment_file>tree_file笔者操作:FastTree-gtr-nt-gammaAth.direct+inverted_ID.cds.229.aln.fasta>Ath.direct+inverted_ID.cds.229.aln.fasta.tree.nwk
以下为FastTree命令终端输出结果:FastTreeVersion 2.1.11 SSE3 ###Alignment:Ath.direct+inverted_ID.cds.229.aln.fastaNucleotidedistances: Jukes-Cantor Joins: balanced Support: SH-like 1000Search:Normal +NNI +SPR (2 rounds range 10) +ML-NNI opt-each=1TopHits:1.00*sqrtN close=default refresh=0.80MLModel: Generalized Time-Reversible, CAT approximation with 20 rate categories ###Ignoredunknown character n (seen 1 times)Initialtopology in 2.07 seconds0 of 224 227 seqs (at seed 200) Refiningtopology: 31 rounds ME-NNIs, 2 rounds ME-SPRs, 16 rounds ML-NNIsTotalbranch-length 98.807 after 18.26 sec 1 of 225 splits 0 changes x delta 0.161) ML-NNIround 1: LogLk = -578498.850 NNIs 45 max delta 21.75 Time 32.02s (max delta 21.753) GTRFrequencies: 0.3022 0.2199 0.2241 0.2538ep 12 of 12 GTRrates(ac ag at cg ct gt) 1.0483 2.5389 1.0248 0.9926 2.7404 1.0000Switchedto using 20 rate categories (CAT approximation)19 of 20 Ratecategories were divided by 0.800 so that average rate = 1.0CAT-basedlog-likelihoods may not be comparable across runsML-NNIround 2: LogLk = -558919.789 NNIs 17 max delta 7.53 Time 58.68es (max delta 7.527) ML-NNIround 3: LogLk = -558887.713 NNIs 8 max delta 0.77 Time 64.15es (max delta 0.334) ML-NNIround 4: LogLk = -558870.798 NNIs 1 max delta 0.11 Time 67.04ML-NNIround 5: LogLk = -558870.004 NNIs 1 max delta 0.51 Time 68.13ML-NNIround 6: LogLk = -558869.763 NNIs 0 max delta 0.00 Time 68.71Turningoff heuristics for final round of ML NNIs (converged)ML-NNIround 7: LogLk = -558646.178 NNIs 0 max delta 0.00 Time 81.92 (final)Optimizeall lengths: LogLk = -558636.448 Time 85.18Gamma(20)LogLk = -566145.706 alpha = 9.988 rescaling lengths by 1.223s Total time: 103.97 seconds Unique: 227/229 Bad splits: 0/224
同样记住软件版本和模型即可。看末尾###标注行。
Step3,最后打开结果树文件,Ath.direct+inverted_ID.cds.229.aln.fasta.tree.nwk,并进行所需的修饰即可。。。如下,