1.提取cds编码序列
略
2.计算密码子偏好性
用EMBOSS的cusp计算
cusp -sequence R6517.cds -outfile R6517.codon.freq
#CdsCount: 53
#Coding GC 37.91%
#1st letter GC 46.19%
#2nd letter GC 38.10%
#3rd letter GC 29.44%
#Codon AA Fraction Frequency Number
GCA A 0.280 15.577 333
GCC A 0.168 9.309 199
3.聚类
参考(https://www.cnblogs.com/foreverycc/archive/2013/04/19/3029873.html)
整理数据成表
head coden_RSCU.tab
Codon KT220690 KT220692 MCB NC027259 NC029370 NC035143 NC035233 NC037247
GCA 1.1197 1.1163 1.1193 1.1053 1.1234 1.1047 1.1212 1.1050
GCC .6273 .6239 .6521 .6571 .6230 .6452 .6700 .7050
GCG .4856 .4856 .4436 .4953 .5004 .4695 .4646 .4610
GCT 1.7672 1.7740 1.7848 1.7422 1.7531 1.7804 1.7441 1.7288
TGC .4705 .4705 .4913 .5361 .5263 .4588 .4869 .4608
TGT 1.5294 1.5294 1.5086 1.4638 1.4736 1.5411 1.5130 1.5391
GAC .3952 .3971 .3901 .3962 .3888 .3887 .3948 .4155
GAT 1.6047 1.6028 1.6098 1.6037 1.6111 1.6112 1.6051 1.5844
GAA 1.5342 1.5360 1.5512 1.5275 1.5257 1.5459 1.5375 1.5326
使用pvClust包聚类
library(pvclust)
a<-read.table(coden_RSCU.tab,header=T)
b<-a[,2:9]
fit <- pvclust(b, method.hclust="ward",method.dist="euclidean")
plot(fit, cex = .5)