原始数据（注释信息）

chr01irgsp1_repmRNA298310815.+.ID=Os01t0100100-01;Name=Os01t0100100-01;Locus_id=Os01g0100100;Note=RabGAP/TBCdomaincontainingprotein.;Transcript_evidence=AK242339(DDBJ%2Cantisensetranscript);ORF_evidence=Q655M0(UniProt);GO=MolecularFunction:RabGTPaseactivatoractivity(GO:0005097),CellularComponent:intracellular(GO:0005622),BiologicalProcess:regulationofRabGTPaseactivity(GO:0032313);InterPro=RabGAP/TBC(IPR000195);NIAS_FLcDNA=J075199P03;TENOR=Os01t0100100-01;KEGG=Os01t0100100-01
chr01irgsp1_repmRNA1121812435.+.ID=Os01t0100200-01;Name=Os01t0100200-01;Locus_id=Os01g0100200;Note=Conservedhypotheticalprotein.;Transcript_evidence=AK059894(DDBJ%2CBesthit);ORF_evidence=B8ACR2(UniProt);NIAS_FLcDNA=006-208-E01;Expression=AK059894;TENOR=Os01t0100200-01;KEGG=Os01t0100200-01

命令行中 cut 的用法

less test_ori|cut -f 9     #-f 是取列操作，本次取出第九列
ID=Os01t0100100-01;Name=Os01t0100100-01;Locus_id=Os01g0100100;Note=RabGAP/TBCdomaincontainingprotein.;Transcript_evidence=AK242339(DDBJ%2Cantisensetranscript);ORF_evidence=Q655M0(UniProt);GO=MolecularFunction:RabGTPaseactivatoractivity(GO:0005097),CellularComponent:intracellular(GO:0005622),BiologicalProcess:regulationofRabGTPaseactivity(GO:0032313);InterPro=RabGAP/TBC(IPR000195);NIAS_FLcDNA=J075199P03;TENOR=Os01t0100100-01;KEGG=Os01t0100100-01
ID=Os01t0100200-01;Name=Os01t0100200-01;Locus_id=Os01g0100200;Note=Conservedhypotheticalprotein.;Transcript_evidence=AK059894(DDBJ%2CBesthit);ORF_evidence=B8ACR2(UniProt);NIAS_FLcDNA=006-208-E01;Expression=AK059894;TENOR=Os01t0100200-01;KEGG=Os01t0100200-01
less test_ori|cut -f  9|cut -d ";" -f 1,4,5#若不是\t分割的列，可以用 -d 指定分隔符号
ID=Os01t0100100-01;Note=RabGAP/TBCdomaincontainingprotein.;Transcript_evidence=AK242339(DDBJ%2Cantisensetranscript)
ID=Os01t0100200-01;Note=Conservedhypotheticalprotein.;Transcript_evidence=AK059894(DDBJ%2CBesthit)

命令行中的模式匹配，去除冗余信息

less test_ori|cut -f 9|cut -d ";" -f 1,4,5|perl -npe "s/ID=//g"   #s/需要替换的内容/替换后的内容/g，g 表示全局匹配，对一行匹配多次，都执行该操作
Os01t0100100-01;Note=RabGAP/TBCdomaincontainingprotein.;Transcript_evidence=AK242339(DDBJ%2Cantisensetranscript)
Os01t0100200-01;Note=Conservedhypotheticalprotein.;Transcript_evidence=AK059894(DDBJ%2CBesthit)
less test_ori|cut -f 9|cut -d ";" -f 1,4,5|perl -npe "s/ID=//g"|perl -npe "s/Note=//g"|perl -npe "s/Transcript_evidence=//g"  #该操作把标签均替换成空
Os01t0100100-01;RabGAP/TBCdomaincontainingprotein.;AK242339(DDBJ%2Cantisensetranscript)
Os01t0100200-01;Conservedhypotheticalprotein.;AK059894(DDBJ%2CBesthit)
less test_ori|cut -f 9|cut -d ";" -f 1,4,5|perl -npe "s/ID=//g"|perl -npe "s/Note=//g"|perl -npe "s/Transcript_evidence=//g"|perl -npe "s/;/\t/g"  #将文件按\t分割，导出后可用excel显示
Os01t0100100-01 RabGAP/TBCdomaincontainingprotein.  AK242339(DDBJ%2Cantisensetranscript)
Os01t0100200-01 Conservedhypotheticalprotein. AK059894(DDBJ%2CBesthit)

模式匹配进阶

观察添加命令： perl -npe "s/Note=.*?;//g" ，数据发生什么变化

less test_ori|cut -f 9|cut -d ";" -f 1,4-6
ID=Os01t0100100-01;Note=RabGAP/TBCdomaincontainingprotein.;Transcript_evidence=AK242339(DDBJ%2Cantisensetranscript);ORF_evidence=Q655M0(UniProt)
ID=Os01t0100200-01;Note=Conservedhypotheticalprotein.;Transcript_evidence=AK059894(DDBJ%2CBesthit);ORF_evidence=B8ACR2(UniProt)
less test_ori|cut -f 9|cut -d ";" -f 1,4-6|perl-npe"s/Note=.*?;//g"
ID=Os01t0100100-01;Transcript_evidence=AK242339(DDBJ%2Cantisensetranscript);ORF_evidence=Q655M0(UniProt)
ID=Os01t0100200-01;Transcript_evidence=AK059894(DDBJ%2CBesthit);ORF_evidence=B8ACR2(UniProt)

perl -npe "s/Note =.*?;//g"  
 #.* 可以匹配任意字符

#我们想要去除Note所在列的信息，也就是Note=一直到其后第一个；出现

#模式匹配具有贪婪性，会尽可能匹配长字符串，不加“？”会一直匹配到每一行最后一个；

改变列的顺序输出

less test_ori|cut -f 9|cut -d ";" -f 1,4,5|perl -npe "s/ /_/g"|perl -npe "s/;/\t/g"
ID=Os01t0100100-01  Note=RabGAP/TBC_domain_containing_protein.  Transcript_evidence=AK242339_(DDBJ%2C_antisense_transcript)
ID=Os01t0100200-01  Note=Conserved_hypothetical_protein.    Transcript_evidence=AK059894_(DDBJ%2C_Best_hit)
less test_ori|cut -f 9|cut -d ";" -f 1,4,5|perl -npe "s/ /_/g"|perl -npe "s/;/\t/g"|awk '{print $2,$1,$3}'#依次输出第二列，第一列，第三列
Note=RabGAP/TBC_domain_containing_protein.ID=Os01t0100100-01Transcript_evidence=AK242339_(DDBJ%2C_antisense_transcript)
Note=Conserved_hypothetical_protein.ID=Os01t0100200-01Transcript_evidence=AK059894_(DDBJ%2C_Best_hit)

注意awk输出的时候，空格也会被识别为列分隔符

可以先把Note=RabGAP/TBC domain containing protein. 中的空格替换成下划线（_）

行处理，取出特定行

less test_ori|cut -f 9|cut -d ";" -f1,4,5|perl -npe "s/ /_/g"|perl -npe "s/;/\t/g"
ID=Os01t0100100-01  Note=RabGAP/TBC_domain_containing_protein.  Transcript_evidence=AK242339_(DDBJ%2C_antisense_transcript)
ID=Os01t0100200-01  Note=Conserved_hypothetical_protein.    Transcript_evidence=AK059894_(DDBJ%2C_Best_hit)
lesstest_ori|cut-f9|cut-d";"-f1,4,5|perl-npe"s/ /_/g"|perl-npe"s/;/\t/g"|sed-n"2p"#取出第二行#
ID=Os01t0100200-01  Note=Conserved_hypothetical_protein.    Transcript_evidence=AK059894_(DDBJ%2C_Best_hit)

讲到这里，常用的命令就讲完了，看你们如何排列组合，发挥它们的潜力啦~

文件输出

less test_ori|cut -f 9|cut -d ";" -f 1,4,5|perl -npe "s/ /_/g"|perl -npe "s/;/\t/g">result  # >后接文件名即可

当处理的文件数据特别大，不方便标准输出，命令行处理过程中可用 head 查看前几行

less test_ori|cut -f 9|cut -d ";" -f 1,4,5|perl -npe "s/ /_/g"|perl -npe "s/;/\t/g"|head -10#查看结果前10行

perl命令行处理sequence数据入门