目前常用的参考基因组主要有三个来源:
- Ensembl:ftp://ftp.ensembl.org/pub/
- UCSC:http://hgdownload.cse.ucsc.edu/downloads.html
- NCBI:ftp://ftp.ncbi.nih.gov/genomes/
一般来说,需要下载的文件包括:fasta序列、GTF文件;
最常用的可以从Ensembl下载
hg38
genome fasta格式
wget ftp://ftp.ensembl.org/pub/release-99/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
注释 gtf格式
wget ftp://ftp.ensembl.org/pub/release-99/gtf/homo_sapiens/Homo_sapiens.GRCh38.99.chr.gtf.gz
mus_musculus
genome
wget ftp://ftp.ensembl.org/pub/release-99/fasta/mus_musculus/dna/Mus_musculus.GRCm38.dna.toplevel.fa.gz
gtf
wget ftp://ftp.ensembl.org/pub/release-99/gtf/mus_musculus/Mus_musculus.GRCm38.99.chr.gtf.gz
caenorhabditis_elegans
genome
wget ftp://ftp.ensembl.org/pub/release-99/fasta/caenorhabditis_elegans/dna/Caenorhabditis_elegans.WBcel235.dna.toplevel.fa.gz
gtf
wget ftp://ftp.ensembl.org/pub/release-99/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.99.gtf.gz