installation
pip3 install CrossMap
download chain files
A chain file describes a pairwise alignment between two reference assemblies. UCSC and Ensembl chain files are available:
UCSC chain files
- Chain files from hs1 (T2T-CHM13) to hg38/hg19/mm10/mm9 (ore vice versa): https://hgdownload.soe.ucsc.edu/goldenPath/hs1/liftOver/
- Chain files from hg38 (GRCh38) to hg19 and all other organisms: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/liftOver/
- Chain File from hg19 (GRCh37) to hg17/hg18/hg38 and all other organisms: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/
- Chain File from mm10 (GRCm38) to mm9 and all other organisms: http://hgdownload.soe.ucsc.edu/goldenPath/mm10/liftOver/
Ensembl chain files
- Human to Human: ftp://ftp.ensembl.org/pub/assembly_mapping/homo_sapiens/
- Mouse to Mouse: ftp://ftp.ensembl.org/pub/assembly_mapping/mus_musculus/
- Other organisms: ftp://ftp.ensembl.org/pub/assembly_mapping/
User Input file
CrossMap supports the following file formats.
BED or BED-like. (BED file must have at least ‘chrom’, ‘start’, ‘end’)
Wiggle (“variableStep”, “fixedStep” and “bedGraph” formats are supported)
usage
CrossMap.py bed hg18ToHg19.over.chain.gz test.hg18.bed3
$ CrossMap.py -h
usage: CrossMap.py [-h] [-v] {bed,bam,gff,wig,bigwig,vcf,gvcf,maf,region,viewchain} ...
CrossMap (v0.6.0) is a program to convert (liftover) genome coordinates between different reference
assemblies (e.g., from human GRCh37/hg19 to GRCh38/hg38 or vice versa). Supported file formats: BAM,
BED, BigWig, CRAM, GFF, GTF, GVCF, MAF (mutation annotation format), SAM, Wiggle, and VCF.
positional arguments:
{bed,bam,gff,wig,bigwig,vcf,gvcf,maf,region,viewchain}
sub-command help
bed converts BED, bedGraph or other BED-like files. Only genome coordinates
(i.e., the first 3 columns) will be updated. Regions mapped to multiple
locations to the new assembly will be split. Use the "region" command to
liftover large genomic regions. Use the "wig" command if you need
bedGraph/bigWig output.
bam converts BAM, CRAM, or SAM format file. Genome coordinates, header section,
all SAM flags, insert size will be updated.
gff converts GFF or GTF format file. Genome coordinates will be updated.
wig converts Wiggle or bedGraph format file. Genome coordinates will be updated.
bigwig converts BigWig file. Genome coordinates will be updated.
vcf converts VCF file. Genome coordinates, header section, reference alleles will
be updated.
gvcf converts GVCF file. Genome coordinates, header section, reference alleles
will be updated.
maf converts MAF (mutation annotation format) file. Genome coordinates and
reference alleles will be updated.
region converts big genomic regions (in BED format) such as CNV blocks. Genome
coordinates will be updated.
viewchain prints out the content of a chain file into a human readable, block-to-block
format.
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
https://crossmap.readthedocs.io/en/latest/