不同版本基因组坐标的转换,常用的方法有:
1. NCBI的 Remap
参见上一篇文章 : https://www.jianshu.com/p/41e5280f59c3
2. UCSC的 LiftOver
https://genome.ucsc.edu/cgi-bin/hgLiftOver
3. CrossMap: http://crossmap.sourceforge.net/#installation
重点介绍和推荐该软件
该软件用法简单,只需要传入2个文件即可。
3.1 下载和安装
(1)Use pip to install CrossMap
pip3 install git+https://github.com/liguowang/CrossMap.git
or
pip3 install CrossMap #Install CrossMap supporting Python3
or
conda install CrossMap
(2) Install CrossMap from source code
$ tar zxf CrossMap-VERSION.tar.gz
$ cd CrossMap-VERSION
# install CrossMap to default location. In Linux/Unix, this location is like:
# /home/user/lib/python2.7/site-packages/
$ python setup.py install
# or you can install CrossMap to a specified location:
$ python setup.py install --root=/home/user/CrossMap
# setup PYTHONPATH. Skip this step if CrossMap was installed to default location.
$ export PYTHONPATH=/home/user/CrossMap/usr/local/lib/python2.7/site-packages:$PYTHONPATH.
# Skip this step if CrossMap was installed to default location.
$ export PATH=/home/user/CrossMap/usr/local/bin:$PATH
3.2 下载chain 文件
该文件是在转换坐标时的输入文件,可以直接从网站下载,找到对应的版本信息就可以了,如下:
UCSC built chain files (Human, Homo sapiens)
hg38ToHg19.over.chain.gz (Chain file for hg38 to hg19 conversion)
hg19ToHg38.over.chain.gz (Chain file for hg19 to hg38 conversion)
hg18ToHg38.over.chain.gz (Chain file for hg18 to hg38 conversion)
hg19ToHg18.over.chain.gz (Chain file for hg19 to hg18 conversion)
hg19ToHg17.over.chain.gz (Chain file for hg19 to hg17 conversion)
hg18ToHg19.over.chain.gz (Chain file for hg18 to hg19 conversion)
hg18ToHg17.over.chain.gz (Chain file for hg18 to hg17 conversion)
hg17ToHg19.over.chain.gz (Chain file for hg17 to hg19 conversion)
hg17ToHg18.over.chain.gz (Chain file for hg17 to hg18 conversion)
GRCh37ToHg19.over.chain.gz (Chain file for GRCh37 to hg19 conversion)
hg19ToGRCh37.over.chain.gz (Chain file for hg19 to GRCh37 conversion)
UCSC built chain files (Mouse, Mus musculus)
mm10ToMm9.over.chain.gz (Chain file for mm10 to mm9 conversion)
mm9ToMm10.over.chain.gz (Chain file for mm9 to mm10 conversion)
mm9ToMm8.over.chain.gz (Chain file for mm9 to mm8 conversion)
UCSC Chain file of other species can be downloaded from: http://hgdownload.soe.ucsc.edu/downloads.html
这里主要提供了人的转换文件,比如要把hg38换成hg19的,就直接下载 (Chain file for hg38 to hg19 conversion) 这个版本就可以了。
3.3 准备输入的bed文件
其实输入的原始坐标文件有很多种类型都能接受如bed、bam、wig、GFF/GTF、VCF、maf等,常见的是bed文件,该bed文件至少包含chr,start,end 这3列,用tab键分割,也可以包含更多列,如strand,ref.Function等信息,但最多只能有12列。
3.4 例子
python3 CrossMap.py bed hg38ToHg19.over.chain.gz in.origion.hg38.bed out.convert.hg19.bed
(1)找到刚才安装的CrossMap.py 脚本,一般在python目录的bin中;
(2)bed 是指定输入文件是bed类型,例如输入一个位点坐标:
(3)hg38ToHg19.over.chain.gz 是刚才下载的chain文件;
(4)in.origion.hg38.bed 是输入的原始坐标的bed文件,这里用的是3列;
(5)out.convert.hg19.bed 是输出文件名称,会与输入的bed的列数一样。
需要说明的是,如果原始坐标转换成新坐标后,坐标区间不连续,则会被分割成2个或更多的区间。