bcftools query file.vcf.gz -f'%FS\n' > file_FS.txt
bcftools query file.vcf.gz -f '%FS\t%SOR\t%MQRankSum\t%ReadPosRankSum\t%QD\t%MQ\t%DP\n' > file_FS.SOR.MQRS.RPRS.QD.MQ.DP.txt
-f defines the output format. The %FS\t%SOR\t… indicates that for each variant first the FS value should be printed, then a tab \t should be printed, followed by the SOR value, followed by a tab etc… . At the end, the \n tells the program that after all six measurements, there should be a new line.
比较两个vcf
>bedtools intersect -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam> [OPTIONS]
bedtools intersect -u -a first.vcf.gz -b second.vcf.gz | wc -l
>vcf-compare first.vcf.gz second.vcf.gz
# First, compress the VCF using bgzip, then index the gzipped VCF
>bgzip first.vcf
>tabix -p vcf first.vcf.gz
>bcftools isec first.vcf.gz second.vcf.gz -p folder
0000.vcf # records private to first.vcf.gz
0001.vcf # records private to second.vcf.gz
0002.vcf # records from first.vcf.gz shared by both
0003.vcf # records from second.vcf.gz shared by both
grep -F -f file1 file2 > #simplest way to obtain overlap rows
小结:
BEDTools可用于比较VCF文件,但只能通过比较基因组坐标进行比较;这可以提供对两个文件中有多少个重叠变异位点的快速解答,并且可以用来计算Jaccard索引,从而指示总体两个文件重叠位点的数量
vcf-compare提供了BEDTools的其他统计信息,包括重复位点的数量和Venn-Diagram数字,它们显示了每个相应的VCF文件中非它变体的数量
bcftools isec还提供了Venn-Diagram数字,并根据这些交集另外创建了VCF文件。
bedtools
比较两个VCF文件?
目前个人还是倾向于bcftools!