annovar 注释

重新用annovar注释：
先转换适合的文件格式：

~/biosoft/annovar/convert2annovar.pl -format vcf4 pooling_variants_all_variants.hg19-hg38.vcf > pooling_variants_all_variants.hg19-hg38.avinput

再下载适合的数据库文件：
下载指令如下：

(base) root@1100150:~/biosoft/annovar# ./annotate_variation.pl | grep downdb
               --downdb                   download annotation database
               --webfrom <string>         specify the source of database (ucsc or annovar or URL) (downdb operation)
            annotate_variation.pl -downdb -webfrom annovar refGene humandb/
            annotate_variation.pl -downdb -buildver mm9 refGene mousedb/
            annotate_variation.pl -downdb -buildver hg19 -webfrom annovar esp6500siv2_all humandb/

下载的数据库：

nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar ensGene humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar esp6500siv2_all humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar dbnsfp35a humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar gnomad30_genome humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar regsnpintron humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar avsnp150 humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar gme humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar gene4denovo201907 humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar 1000g2015aug humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom'http://www.openbioinformatics.org/annovar/download/GDI_full_10282015.txt.gz' humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom'http://www.openbioinformatics.org/annovar/download/RVIS_ExAC_4KW.txt.gz' humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom'http://download.openbioinformatics.org/spidex_download_form.php' humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar mcap humandb/ &
nohup ./annotate_variation.pl -downdb -buildver hg38 -webfrom annovar revel humandb/ &

数据库文件来源

https://annovar.openbioinformatics.org/en/latest/user-guide/download/

- For gene-based annotation

基于基因的注释

Build	Table Name	Explanation	Date
hg18	refGene	FASTA sequences for all annotated transcripts in RefSeq Gene	20190929
hg19	refGene	same as above	20190929
hg38	refGene	same as above	20190929
hg18	refGeneWithVer	FASTA sequences for all annotated transcripts in RefSeq Gene with version number	20190929
hg19	refGeneWithVer	same as above	20190929
hg38	refGeneWithVer	same as above	20190929
hg18	knownGene	FASTA sequences for all annotated transcripts in UCSC Known Gene	20190929
hg19	knownGene	same as above	20190929
hg38	knownGene	same as above	20190929
hg18	ensGene	FASTA sequences for all annotated transcripts in Gencode v31 Basic collection	20190929
hg19	ensGene	same as above	20190929
hg38	ensGene	same as above	20190929

- For filter-based annotation

过滤数据库

Build	Table Name	Explanation	Date
hg18	avsift	whole-exome SIFT scores for non-synonymous variants (obselete and should not be uesd any more)	20120222
hg19	avsift	same as above	20120222
hg18	ljb26_all	whole-exome SIFT, PolyPhen2 HDIV, PolyPhen2 HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, MetaSVM, MetaLR, VEST, CADD, GERP++, PhyloP and SiPhy scores from dbNSFP version 2.6	20140925
hg19	ljb26_all	same as above	20140925
hg38	ljb26_all	same as above	20150520
hg18	dbnsfp30a	whole-exome SIFT, PolyPhen2 HDIV, PolyPhen2 HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, MetaSVM, MetaLR, VEST, CADD, GERP++, DANN, fitCons, PhyloP and SiPhy scores from dbNSFP version 3.0a	20151015
hg19	dbnsfp30a	same as above	20151015
hg38	dbnsfp30a	same as above	20151015
hg19	dbnsfp31a_interpro	protein domain for variants	20151219
hg38	dbnsfp31a_interpro	same as above	20151219
hg18	dbnsfp33a	whole-exome SIFT, PolyPhen2 HDIV, PolyPhen2 HVAR, LRT, MutationTaster, MutationAssessor, FATHMM, PROVEAN, MetaSVM, MetaLR, VEST, M-CAP, CADD, GERP++, DANN, fathmm-MKL, Eigen, GenoCanyon, fitCons, PhyloP and SiPhy scores from dbNSFP version 3.3a	20170221
hg19	dbnsfp33a	same as above	20170221
hg38	dbnsfp33a	same as above	20170221
hg18	dbnsfp35a	same as above	20180921
hg19	dbnsfp35a	same as above	20180921
hg38	dbnsfp35a	same as above	20180921
hg18	dbnsfp35c	same as above, suitable for commercial use	20181023
hg19	dbnsfp35c	same as above	20181023
hg38	dbnsfp35c	same as above	20181023
hg19	dbscsnv11	dbscSNV version 1.1 for splice site prediction by AdaBoost and Random Forest	20151218
hg38	dbscsnv11	same as above	20151218
hg19	intervar_20170202	InterVar: clinical interpretation of missense variants (indels not supported)	20170202
hg19	intervar_20180118	InterVar: clinical interpretation of missense variants (indels not supported)	20180325
hg38	intervar_20180118	InterVar: clinical interpretation of missense variants (indels not supported)	20180325
hg18	cg46	alternative allele frequency in 46 unrelated human subjects sequenced by Complete Genomics	20120222
hg19	cg46	same as above	index updated 2012Feb22
hg18	cg69	allele frequency in 69 human subjects sequenced by Complete Genomics	20120222
hg19	cg69	same as above	20120222
hg19	cosmic64	COSMIC database version 64	20130520
hg19	cosmic65	COSMIC database version 65	20130706
hg19	cosmic67	COSMIC database version 67	20131117
hg19	cosmic67wgs	COSMIC database version 67 on WGS data	20131117
hg19	cosmic68	COSMIC database version 68	20140224
hg19	cosmic68wgs	COSMIC database version 68 on WGS data	20140224
hg19	cosmic70	same as above	20140911
hg18	cosmic70	same as above	20150428
hg38	cosmic70	same as above	20150428
hg19/hg38	cosmic71, 72, ..., 80	read here
hg18	esp6500siv2_ea	alternative allele frequency in European American subjects in the NHLBI-ESP project with 6500 exomes, including the indel calls and the chrY calls. This is lifted over from hg19 by myself	20141222
hg19	esp6500siv2_ea	same as above	20141222
hg38	esp6500siv2_ea	same as above, lifted over from hg19 by myself	20141222
hg18	esp6500siv2_aa	alternative allele frequency in African American subjects in the NHLBI-ESP project with 6500 exomes, including the indel calls and the chrY calls. This is lifted over from hg19 by myself.	20141222
hg19	esp6500siv2_aa	same as above	20141222
hg38	esp6500siv2_aa	same as above, lifted over from hg19 by myself	20141222
hg18	esp6500siv2_all	alternative allele frequency in All subjects in the NHLBI-ESP project with 6500 exomes, including the indel calls and the chrY calls. This is lifted over from hg19 by myself.	20141222
hg19	esp6500siv2_all	same as above	20141222
hg38	esp6500siv2_all	same as above, lifted over from hg19 by myself	20141222
hg19	exac03	ExAC 65000 exome allele frequency data for ALL, AFR (African), AMR (Admixed American), EAS (East Asian), FIN (Finnish), NFE (Non-finnish European), OTH (other), SAS (South Asian)). version 0.3. Left normalization done.	20151129
hg18	exac03	same as above	20151129
hg38	exac03	same as above	20151129
hg19	exac03nontcga	ExAC on non-TCGA samples (updated header)	20160423
hg38	exac03nontcga	same as above	20160423
hg19	exac03nonpsych	ExAC on non-Psychiatric disease samples (updated header)	20160423
hg38	exac03nonpsych	same as above	20160423
hg38	exac10	No difference as exac03 based on this; use exac03 instead	X
hg19	gene4denovo201907	gene4denovo database	20191101
hg38	gene4denovo201907	gene4denovo database	20191101
hg19	gnomad_exome	gnomAD exome collection (v2.0.1)	20170311
hg38	gnomad_exome	gnomAD exome collection (v2.0.1)	20170311
hg19	gnomad_genome	gnomAD genome collection (v2.0.1)	20170311
hg38	gnomad_genome	gnomAD genome collection (v2.0.1)	20170311
hg19	gnomad211_exome	gnomAD exome collection (v2.1.1), with "AF AF_popmax AF_male AF_female AF_raw AF_afr AF_sas AF_amr AF_eas AF_nfe AF_fin AF_asj AF_oth non_topmed_AF_popmax non_neuro_AF_popmax non_cancer_AF_popmax controls_AF_popmax" header	20190318
hg19	gnomad211_genome	same as above	20190323
hg38	gnomad211_exome	same as above	20190409
hg38	gnomad211_genome	same as above	20190409
hg38	gnomad30_genome	version 3.0 whole-genome data	20191104
hg19	kaviar_20150923	170 million Known VARiants from 13K genomes and 64K exomes in 34 projects	20151203
hg38	kaviar_20150923	same as above	20151203
hg19	hrcr1	40 million variants from 32K samples in haplotype reference consortium	20151203
hg38	hrcr1	same as above	20151203
hg19	abraom	2.3 million Brazilian genomic variants	20181204
hg38	abraom	liftOver from above	20181204
hg18	1000g (3 data sets)	alternative allele frequency data in 1000 Genomes Project	20120222
hg18	1000g2010 (3 data sets)	same as above	20120222
hg18	1000g2010jul (3 data sets)	same as above	20120222
hg18	1000g2012apr	I lifted over the latest 1000 Genomes Project data to hg18, to help researchers working with hg18 coordinates	20120820
hg19	1000g2010nov	same as above	20120222
hg19	1000g2011may	same as above	20120222
hg19	1000g2012feb	same as above	20130308
hg18	1000g2012apr (5 data sets)	This is done by liftOver of the hg19 data below. It contains alternative allele frequency data in 1000 Genomes Project for ALL, AMR (admixed american), EUR (european), ASN (asian), AFR (african) populations	20130508
hg19	1000g2012apr (5 data sets)	alternative allele frequency data in 1000 Genomes Project for ALL, AMR (admixed american), EUR (european), ASN (asian), AFR (african) populations	20120525
hg19	1000g2014aug (6 data sets)	alternative allele frequency data in 1000 Genomes Project for autosomes (ALL, AFR (African), AMR (Admixed American), EAS (East Asian), EUR (European), SAS (South Asian)). Based on 201408 collection v4 (based on 201305 alignment)	20140915
hg19	1000g2014sep (6 data sets)	alternative allele frequency data in 1000 Genomes Project for autosomes (ALL, AFR (African), AMR (Admixed American), EAS (East Asian), EUR (European), SAS (South Asian)). Based on 201409 collection v5 (based on 201305 alignment)	20140925
hg19	1000g2014oct (6 data sets)	alternative allele frequency data in 1000 Genomes Project for autosomes (ALL, AFR (African), AMR (Admixed American), EAS (East Asian), EUR (European), SAS (South Asian)). Based on 201409 collection v5 (based on 201305 alignment) but including chrX and chrY data finally!	20141216
hg18	1000g2014oct (6 data sets)	same as above	20150428
hg38	1000g2014oct (6 data sets)	same as above	20150424
hg19	1000g2015aug (6 data sets)	The 1000G team fixed a bug in chrX frequency calculation. Based on 201508 collection v5b (based on 201305 alignment)	20150824
hg38	1000g2015aug (6 data sets)	same as above	20150824
hg19	gme	Great Middle East allele frequency including NWA (northwest Africa), NEA (northeast Africa), AP (Arabian peninsula), Israel, SD (Syrian desert), TP (Turkish peninsula) and CA (Central Asia)	20161024
hg38	gme	same as above	20161024
hg19	mcap	M-CAP scores for non-synonymous variants	20161104
hg38	mcap	same as above	20161104
hg19	mcap13	[M-CAP scores v1.3]	20181203
hg19	revel	REVEL scores for non-synonymous variants	20161205
hg38	revel	same as above	20161205
hg18	snp128	dbSNP with ANNOVAR index files	20120222
hg18	snp129	same as above	20120222
hg19	snp129	liftover from hg18_snp129.txt	20120809
hg18	snp130	same as above	20120222
hg19	snp130	same as above	20120222
hg18	snp131	same as above	20120222
hg19	snp131	same as above	20120222
hg18	snp132	same as above	20120222
hg19	snp132	same as above	20120222
hg18	snp135	I lifted over SNP135 to hg18	20120820
hg19	snp135	same as above	20120222
hg19	snp137	same as above	20130109
hg18	snp138	I lifted over SNP138 to hg18	20140910
hg19	snp138	same as above	file and index updated 20140910
hg19	avsnp138	dbSNP138 with allelic splitting and left-normalization	20141223
hg19	avsnp142	dbSNP142 with allelic splitting and left-normalization	20141228
hg19	avsnp144	dbSNP144 with allelic splitting and left-normalization (careful with bugs!)	20151102
hg38	avsnp144	same as above	20151102
hg19	avsnp147	dbSNP147 with allelic splitting and left-normalization	20160606
hg38	avsnp142	dbSNP142 with allelic splitting and left-normalization	20160106
hg38	avsnp144	dbSNP144 with allelic splitting and left-normalization	20151102
hg38	avsnp147	dbSNP147 with allelic splitting and left-normalization	20160606
hg19	avsnp150	dbSNP150 with allelic splitting and left-normalization	20170929
hg38	avsnp150	dbSNP150 with allelic splitting and left-normalization	20170929
hg18	snp128NonFlagged	dbSNP with ANNOVAR index files, after removing those flagged SNPs (SNPs < 1% minor allele frequency (MAF) (or unknown), mapping only once to reference assembly, flagged in dbSnp as "clinically associated")	20120524
hg18	snp129NonFlagged	same as above	20120524
hg18	snp130NonFlagged	same as above	20120524
hg19	snp130NonFlagged	same as above	20120524
hg18	snp131NonFlagged	same as above	20120524
hg19	snp131NonFlagged	same as above	20120524
hg18	snp132NonFlagged	same as above	20120524
hg19	snp132NonFlagged	same as above	20120524
hg19	snp135NonFlagged	same as above	20120524
hg19	snp137NonFlagged	same as above	20130109
hg19	snp138NonFlagged	same as above	20140222
hg19	nci60	NCI-60 human tumor cell line panel exome sequencing allele frequency data	20130724
hg18	nci60	same as above	20150428
hg38	nci60	same as above	20150428
hg19	icgc21	International Cancer Genome Consortium version 21	20160622
hg19	clinvar_20131105	CLINVAR database with Variant Clinical Significance (unknown, untested, non-pathogenic, probable-non-pathogenic, probable-pathogenic, pathogenic, drug-response, histocompatibility, other) and Variant disease name	20140430
hg19	clinvar_20140211	same as above	20140430
hg19	clinvar_20140303	same as above	20140430
hg19	clinvar_20140702	same as above	20140712
hg38	clinvar_20140702	same as above	20140712
hg19	clinvar_20140902	same as above	20140911
hg38	clinvar_20140902	same as above	20140911
hg19	clinvar_20140929	same as above	20141002
hg19	clinvar_20150330	same as above but with variant normalization	20150413
hg38	clinvar_20150330	same as above but with variant normalization	20150413
hg19	clinvar_20150629	same as above but with variant normalization	20150724
hg38	clinvar_20150629	same as above but with variant normalization	20150724
hg19	clinvar_20151201	Clinvar version 20151201 with separate columns (CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID)	20160303
hg38	clinvar_20151201	same as avove	20160303
hg19	clinvar_20160302	Clinvar version 20160302 with separate columns (CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID)	20171003
hg38	clinvar_20160302	same as above (updated 20171003 to handle multi-allelic variants)	20171003
hg19	clinvar_20161128	Clinvar version 20161128 with separate columns (CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID)	20171003
hg38	clinvar_20161128	same as above (updated 20170215 to add missing header line; 20171003 to handle multi-allelic variants)	20171003
hg19	clinvar_20170130	Clinvar version 20170130 with separate columns (CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID)	20171003
hg38	clinvar_20170130	same as above	20171003
hg19	clinvar_20170501	Clinvar version 20170130 with separate columns (CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID)	20171003
hg38	clinvar_20170501	same as above	20171003
hg19	clinvar_20170905	Clinvar version 20170905 with separate columns (CLINSIG CLNDBN CLNACC CLNDSDB CLNDSDBID)	20171003
hg38	clinvar_20170905	same as above	20171003
hg19	clinvar_20180603	Clinvar version 20180603 with separate columns (CLNALLELEID CLNDN CLNDISDB CLNREVSTAT CLNSIG)	20180708
hg38	clinvar_20180603	same as above	20180708
hg19	clinvar_20190305	Clinvar version 20190305 with separate columns (CLNALLELEID CLNDN CLNDISDB CLNREVSTAT CLNSIG)	20190311
hg38	clinvar_20190305	same as above	20190316
hg19	clinvar_20200316	Clinvar version 20200316 with separate columns (CLNALLELEID CLNDN CLNDISDB CLNREVSTAT CLNSIG)	20200401
hg38	clinvar_20200316	same as above	20200401
hg19	popfreq_max_20150413	A database containing the maximum allele frequency from 1000G, ESP6500, ExAC and CG46	20150413
hg19	popfreq_all_20150413	A database containing all allele frequency from 1000G, ESP6500, ExAC and CG46	20150413
hg19	mitimpact2	pathogenicity predictions of human mitochondrial missense variants (see here	20150520
hg19	mitimpact24	same as above with version 2.4	20160123
hg19	regsnpintron	prioritize the disease-causing probability of intronic SNVs	20180920
hg38	regsnpintron	lifeOver of above	20180922
hg18	gerp++elem	conserved genomic regions by GERP++	20140223
hg19	gerp++elem	same as above	20140223
mm9	gerp++elem	same as above	20140223
hg18	gerp++gt2	whole-genome GERP++ scores greater than 2 (RS score threshold of 2 provides high sensitivity while still strongly enriching for truly constrained sites. )	20120621
hg19	gerp++gt2	same as above	20120621
hg19	caddgt20	with score>20	20160607
hg19	caddgt10	CADD with score>10	20160607
hg19	cadd	CADD	20140223
hg19	cadd13	CADD version 1.3	20170123
hg19	cadd13gt10	CADD version 1.3 score>10	20170123
hg19	cadd13gt20	CADD version 1.3 score>20	20170123
hg19	caddindel	removed	20150505
hg19	fathmm	whole-genome FATHMM_coding and FATHMM_noncoding scores (noncoding and coding scores in the 2015 version was reversed)	20160315
hg19	gwava	whole genome GWAVA_region_score and GWAVA_tss_score (GWAVA_unmatched_score has bug in file), see ref.	20150623
hg19	eigen	whole-genome Eigen scores, see ref	20160330

User-contributed datasets

Several generous ANNOVAR users provide additional annotation datasets that may help other users. These datasets are described below:

MitImpact2: pathogenicity predictions of human mitochondrial missense variants. This is prepared as filter-based annotation format and users can directly download from ANNOVAR (see table above).
regsnpintron: regSNP-intron uses a machine learning algorithm to prioritize the disease-causing probability of intronic SNVs. The columns are "fpr (False positive rate), disease Disease category (B: benign [FPR > 0.1]; PD: Possibly Damaging [0.05 < FPR <= 0.1]; D: Damaging [FPR <= 0.05]), splicing_site Splicing site (on/off). Splicing sites are defined as -3 to +7 for donor sites, -13 to +1 for acceptor sites.". This is prepared as filter-based annotation format and users can directly download from ANNOVAR (see table above).
LoFtool score: gene loss-of-function score percentiles. The smaller the percentile, the most intolerant is the gene to functional variation. The file can be downloaded here. Manuscript in preparation (please contact Dr. Joao Fadista - joao.fadista@med.lu.se). The authors would like to thank the Exome Aggregation Consortium and the groups that provided exome variant data for comparison. A full list of contributing groups can be found at http://exac.broadinstitute.org/about.
RVIS-ESV score: RVIS score measures genetic intolerance of genes to functional mutations, as described in Petrovski et al. Original RVIS was constructed based on patterns of standing variation in 6503 samples. The authors have recently constructed scores based on the ~61,000 samples from ExAC. There is high correlation, but more resolution for many genes. The ExAC cohort implementation is what we consider RVIS (v2). It can be downloaded here.
GDI score: the gene damage index (GDI) is describing the accumulated mutational damage for each human gene in the general population, and shows that highly mutated/damaged genes are unlikely to be disease-causing and yet they generate a big proportion of false positive variants harbored in such genes. Therefore removing high GDI genes is a very effective way to remove confidently false positives from WES/WGS data. More details were given in this paper. The data set includes general damage prediction (low/medium/high) for different disease type (all, Mendelian, cancer, and PID) and can be downloaded from here.
TMC-SNPDB: SNP database from whole exome data of 62 normal samples derived from cancer patients of Indian origin, representing 114, 309 unique germline variants. Read the manuscript here. It is useful for exome sequencing studies on Indian populations and can be downloaded from here.
GenoNet Scores: cell-specific functional elements predicted by GenoNet organized by chromosomes in many cell types. You must use the specific link to download the files.

Third-party datasets

Several third-party researchers have provided additional annotation datasets that can be used by ANNOVAR directly. However, users need to agree to specific license terms set forth by the third parties:

SPIDEX: SPIDEX 1.0 - Deep Genomics : (Xiong et al, Science 2015) Machine-learning prediction on how genetic variants affect RNA splicing. This dataset can be downloaded here.

Third-party software tools

Customprodbj is a Java-based tool for customized protein database construction. It can build the database on a single or multiple VCF files on single or multiple individuals. It can be accessed at here. Command line example: java -jar customprodbj.jar -f input_variant_file_list.txt -d annovar_database/humandb/hg19_refGeneMrna.fa -r annovar_database/humandb/hg19_refGene.txt -t -o out/.

http://www.openbioinformatics.org/annovar/download/RVIS_ExAC_4KW.txt.gz

http://www.pnas.org/content/early/2015/10/14/1518646112.abstract

http://www.openbioinformatics.org/annovar/download/GDI_full_10282015.txt.gz

http://www.openbioinformatics.org/annovar/download/GenoNetScores/ByChr/index.html

http://download.openbioinformatics.org/spidex_download_form.php

Table_annovar.pl（可一次完成三种类型的注释）
使用ANNOVAR最简单的方法就是使用table_annovar.pl进行注释，它的输入文件可以是多种格式包括VCF，输出文件已Tab分隔，每一列代表着一种注释。
注释命令示例：

~/biosoft/annovar/table_annovar.pl pooling_variants_all_variants.hg19-hg38.avinput ~/biosoft/annovar/humandb/ -buildver hg38 -outchen_test -remove -protocol refGene -operation g -nastring . -csvout -polish
~/biosoft/annovar/table_annovar.pl pooling_variants_all_variants.hg19-hg38.avinput ~/biosoft/annovar/humandb/ -buildver hg38 -outmyanno -remove -protocolrefGene,knownGene,ensGene,dbnsfp35a,esp6500siv2_all,exac03,gene4denovo201907,gnomad30_genome,1000g2015aug_all,avsnp150,clinvar_20200316,regsnpintron -operation g,g,g,f,f,f,f,f,f,f,f,f -nastring . -csvout -polish
#-buildver hg38 表示使用hg38版本
#-out myanno 表示输出文件的前缀为myanno
# -remove 表示删除注释过程中的临时文件
# -protocol 表示注释使用的数据库，用逗号隔开，且要注意顺序
# -operation 表示对应顺序的数据库的类型（g代表gene-based、r代表region-based、f代表filter-based），用逗号隔开，注意顺序
# -nastring . 表示用点号替代缺省的值
# -csvout 表示最后输出.csv文件

输出的csv文件将包含输入的5列主要信息以及各个数据库里的注释，此外，table_annoval.pl可以直接对vcf文件进行注释（不需要转换格式），注释的内容将会放在vcf文件的“INFO”那一栏。

本次注释指令及过程信息如下：

(base) root@1100150:~/new for annovar# ~/biosoft/annovar/table_annovar.pl pooling_variants_all_variants.hg19-hg38.avinput ~/biosoft/annovar/humandb/ -buildver hg38 -out myanno -remove -protocol refGene,knownGene,ensGene,dbnsfp35a,esp6500siv2_all,exac03,gene4denovo201907,gnomad30_genome,1000g2015aug_all,avsnp150,clinvar_20200316,regsnpintron -operation g,g,g,f,f,f,f,f,f,f,f,f -nastring . -csvout -polish
-----------------------------------------------------------------
NOTICE: Processing operation=g protocol=refGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg38 -dbtype refGene -outfile myanno.refGene -exonsort -nofirstcodondel pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/>
NOTICE: Output files are written to myanno.refGene.variant_function, myanno.refGene.exonic_variant_function
NOTICE: Reading gene annotation from /root/biosoft/annovar/humandb/hg38_refGene.txt ... Done with 82500 transcripts (including 20366 without coding sequence annotation) for 28265 unique genes
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Reading FASTA sequences from /root/biosoft/annovar/humandb/hg38_refGeneMrna.fa ... Done with 803 sequences
WARNING: A total of 591 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Running with system command <coding_change.pl myanno.refGene.exonic_variant_function.orig /root/biosoft/annovar/humandb//hg38_refGene.txt /root/biosoft/annovar/humandb//hg38_refGeneMrna.fa -alltranscript -outmyanno.refGene.fa -newevf myanno.refGene.exonic_variant_function>
-----------------------------------------------------------------
NOTICE: Processing operation=g protocol=knownGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg38 -dbtype knownGene -outfilemyanno.knownGene -exonsort -nofirstcodondel pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/>
NOTICE: Output files are written to myanno.knownGene.variant_function, myanno.knownGene.exonic_variant_function
NOTICE: Reading gene annotation from /root/biosoft/annovar/humandb/hg38_knownGene.txt ... Done with 226811 transcripts (including 118121 without coding sequence annotation) for 74691 unique genes
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Reading FASTA sequences from /root/biosoft/annovar/humandb/hg38_knownGeneMrna.fa ... Done with 1335 sequences
WARNING: A total of 8181 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Running with system command <coding_change.pl myanno.knownGene.exonic_variant_function.orig /root/biosoft/annovar/humandb//hg38_knownGene.txt /root/biosoft/annovar/humandb//hg38_knownGeneMrna.fa -alltranscript -outmyanno.knownGene.fa -newevf myanno.knownGene.exonic_variant_function>
-----------------------------------------------------------------
NOTICE: Processing operation=g protocol=ensGene

NOTICE: Running with system command <annotate_variation.pl -geneanno -buildver hg38 -dbtype ensGene -outfilemyanno.ensGene -exonsort -nofirstcodondel pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/>
NOTICE: Output files are written to myanno.ensGene.variant_function, myanno.ensGene.exonic_variant_function
NOTICE: Reading gene annotation from /root/biosoft/annovar/humandb/hg38_ensGene.txt ... Done with 89732 transcripts (including 28806 without coding sequence annotation) for 42087 unique genes
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Reading FASTA sequences from /root/biosoft/annovar/humandb/hg38_ensGeneMrna.fa ... Done with 606 sequences
WARNING: A total of 214 sequences cannot be found in /root/biosoft/annovar/humandb/hg38_ensGeneMrna.fa
(example: ENST00000293894.3#16#981807 ENST00000349496.9#3#41199438 ENST00000255192.7#5#79069716)
WARNING: A total of 385 sequences will be ignored due to lack of correct ORF annotation

NOTICE: Running with system command <coding_change.pl myanno.ensGene.exonic_variant_function.orig /root/biosoft/annovar/humandb//hg38_ensGene.txt /root/biosoft/annovar/humandb//hg38_ensGeneMrna.fa -alltranscript -outmyanno.ensGene.fa -newevf myanno.ensGene.exonic_variant_function>
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=dbnsfp35a
NOTICE: Finished reading 70 column headers for '-dbtype dbnsfp35a'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype dbnsfp35a -buildver hg38 -outfile myanno pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/ -otherinfo>
NOTICE: Output file with variants matching filtering criteria is written to myanno.hg38_dbnsfp35a_dropped, and output file with other variants is written to myanno.hg38_dbnsfp35a_filtered
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Database index loaded. Total number of bins is 552168 and the number of bins to be scanned is 2918
NOTICE: Scanning filter database /root/biosoft/annovar/humandb/hg38_dbnsfp35a.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=esp6500siv2_all

NOTICE: Running system command <annotate_variation.pl -filter -dbtype esp6500siv2_all -buildver hg38 -outfile myanno pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/>
NOTICE: the --dbtype esp6500siv2_all is assumed to be in generic ANNOVAR database format
NOTICE: Output file with variants matching filtering criteria is written to myanno.hg38_esp6500siv2_all_dropped, and output file with other variants is written to myanno.hg38_esp6500siv2_all_filtered
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Database index loaded. Total number of bins is 683825 and the number of bins to be scanned is 3065
NOTICE: Scanning filter database /root/biosoft/annovar/humandb/hg38_esp6500siv2_all.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=exac03
NOTICE: Finished reading 8 column headers for '-dbtype exac03'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype exac03 -buildver hg38 -outfile myanno pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/ -otherinfo>
NOTICE: Output file with variants matching filtering criteria is written to myanno.hg38_exac03_dropped, and output file with other variants is written to myanno.hg38_exac03_filtered
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Database index loaded. Total number of bins is 749044 and the number of bins to be scanned is 3310
NOTICE: Scanning filter database /root/biosoft/annovar/humandb/hg38_exac03.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=gene4denovo201907
NOTICE: Finished reading 6 column headers for '-dbtype gene4denovo201907'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype gene4denovo201907 -buildver hg38 -outfile myanno pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/ -otherinfo>
NOTICE: the --dbtype gene4denovo201907 is assumed to be in generic ANNOVAR database format
NOTICE: Output file with variants matching filtering criteria is written to myanno.hg38_gene4denovo201907_dropped, and output file with other variants is written to myanno.hg38_gene4denovo201907_filtered
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Database index loaded. Total number of bins is 501939 and the number of bins to be scanned is 848
NOTICE: Scanning filter database /root/biosoft/annovar/humandb/hg38_gene4denovo201907.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=gnomad30_genome
NOTICE: Finished reading 13 column headers for '-dbtype gnomad30_genome'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype gnomad30_genome -buildver hg38 -outfile myanno pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/ -otherinfo>
NOTICE: Output file with variants matching filtering criteria is written to myanno.hg38_gnomad30_genome_dropped, and output file with other variants is written to myanno.hg38_gnomad30_genome_filtered
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Database index loaded. Total number of bins is 2860873 and the number of bins to be scanned is 3049
NOTICE: Scanning filter database /root/biosoft/annovar/humandb/hg38_gnomad30_genome.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=1000g2015aug_all

NOTICE: Running system command <annotate_variation.pl -filter -dbtype 1000g2015aug_all -buildver hg38 -outfile myanno pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/>
NOTICE: Output file with variants matching filtering criteria is written to myanno.hg38_ALL.sites.2015_08_dropped, and output file with other variants is written to myanno.hg38_ALL.sites.2015_08_filtered
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Database index loaded. Total number of bins is 2821635 and the number of bins to be scanned is 3052
NOTICE: Scanning filter database /root/biosoft/annovar/humandb/hg38_ALL.sites.2015_08.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=avsnp150

NOTICE: Running system command <annotate_variation.pl -filter -dbtype avsnp150 -buildver hg38 -outfile myanno pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/>
NOTICE: Output file with variants matching filtering criteria is written to myanno.hg38_avsnp150_dropped, and output file with other variants is written to myanno.hg38_avsnp150_filtered
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Database index loaded. Total number of bins is 28304406 and the number of bins to be scanned is 9229
NOTICE: Scanning filter database /root/biosoft/annovar/humandb/hg38_avsnp150.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=clinvar_20200316
NOTICE: Finished reading 5 column headers for '-dbtype clinvar_20200316'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype clinvar_20200316 -buildver hg38 -outfile myanno pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/ -otherinfo>
NOTICE: the --dbtype clinvar_20200316 is assumed to be in generic ANNOVAR database format
NOTICE: Output file with variants matching filtering criteria is written to myanno.hg38_clinvar_20200316_dropped, and output file with other variants is written to myanno.hg38_clinvar_20200316_filtered
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Database index loaded. Total number of bins is 72414 and the number of bins to be scanned is 1706
NOTICE: Scanning filter database /root/biosoft/annovar/humandb/hg38_clinvar_20200316.txt...Done
-----------------------------------------------------------------
NOTICE: Processing operation=f protocol=regsnpintron
NOTICE: Finished reading 3 column headers for '-dbtype regsnpintron'

NOTICE: Running system command <annotate_variation.pl -filter -dbtype regsnpintron -buildver hg38 -outfile myanno pooling_variants_all_variants.hg19-hg38.avinput /root/biosoft/annovar/humandb/ -otherinfo>
NOTICE: the --dbtype regsnpintron is assumed to be in generic ANNOVAR database format
NOTICE: Output file with variants matching filtering criteria is written to myanno.hg38_regsnpintron_dropped, and output file with other variants is written to myanno.hg38_regsnpintron_filtered
NOTICE: Processing next batch with 13802 unique variants in 13802 input lines
NOTICE: Database index loaded. Total number of bins is 1162669 and the number of bins to be scanned is 1874
NOTICE: Scanning filter database /root/biosoft/annovar/humandb/hg38_regsnpintron.txt...Done
-----------------------------------------------------------------
NOTICE: Multianno output file is written to myanno.hg38_multianno.csv

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 204,732评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 87,496评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 151,264评论 0赞 338
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,807评论 1赞 277
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,806评论 5赞 368
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,675评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,029评论 3赞 399
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,683评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 41,704评论 1赞 299
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,666评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,773评论 1赞 332
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,413评论 4赞 321
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,016评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,978评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,204评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,083评论 2赞 350
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,503评论 2赞 343