准备工作:
下载bismark及其依赖包perl、bowtie2或者histat2、samtools
下载参考基因组序列fasta格式
下载目标比对序列 fasta或者fastq格式
并且将程序路径写入环境变量(vim .bashrc)
(I) Running bismark_genome_preparation
USAGE:bismark_genome_preparation [options] <path_to_genome_folder>
A typical genome indexing could look like this:/bismark/bismark_genome_preparation --path_to_aligner /usr/bin/bowtie2/ --verbose /data/genomes/homo_sapiens/GRCh37/
path_to_aligner 指定用于align的程序的路径
bismark_genome_preparation --verbose /data/home/lt/Bismark-0.22.3/reference/
如果在环境变量中已经指定过,这里就不用指定。
Bismark will create two individual folders within this directory, one for a C->T converted genome and the other one for the G->A converted genome. After creating C->T and G->A versions of the genome they will be indexed in parallel using the indexer bowtie2-build (or hisat2-build).
这一步完成之后会生成两个文件夹。
(II) Bismark Alignment Step
在此步骤之前最好对将要比对的数据进行质控。
USAGE:bismark [options] --genome <genome_folder> {-1 <mates1> -2 <mates2> | <singles>}
A typical single-end analysis could look like this:
bismark --genome /data/genomes/homo_sapiens/GRCh38/ sample.fastq.gz
bismark --genome /data/home/lt/Bismark-0.22.3/reference/ /data/home/lt/fastq/liver/liver_1.fastq
bismark支持的目标比对文件格式
sequence format either FastQ or FastA
single-end or paired-end reads
input files can be uncompressed orgzip-compressed (ending in.gz)
variable read length support
directional or non-directional BS-Seq libraries
注意后面一定要包含对应的文件名而不是只指定文件夹(这里我错了还不自知嘤嘤嘤)
This will produce two output files:1.test_dataset_bismark_bt2.bam(contains all alignments plus methylation call strings)2.test_dataset_bismark_SE_report.txt(contains alignment and methylation summary)
这也会产生两个文件。
(III) Running deduplicate_bismark
deduplicate_bismark --bam [options] <filenames>
This command will deduplicate the Bismark alignment BAM file and remove all reads but one which align to the the very same position and in the same orientation. This step is recommended for whole-genome bisulfite samples, but should not be used for reduced representation libraries such as RRBS, amplicon or target enrichment libraries.
所以这一步我就跳过啦。
(IV) Running bismark_methylation_extractor
USAGE:
bismark_methylation_extractor [options] <filenames>
A typical command to extract context-dependent (CpG/CHG/CHH) methylation could look like this:
bismark_methylation_extractor --gzip --bedGraph test_dataset_bismark_bt2.bam
This will produce three methytlation output files:
CpG_context_test_dataset_bismark_bt2.txt.gz
CHG_context_test_dataset_bismark_bt2.txt.gz
CHH_context_test_dataset_bismark_bt2.txt.gz
as well as a bedGraph and a Bismark coverage file.
bismark_methylation_extractor --gzip --bedGraph liver_1_bismark_bt2.bam
The bismark_methylation_extractor comes with a few options, such as ignoring the first number of positions in the methylation call string, e.g. to remove a restriction enzyme site (if RRBS is performed with non-directional BS-Seq libraries it might be required to remove reconstituted MspI sites at the beginning of each read as they will introduce a bias into the first methylation call).
对于非方向性的BS-seq文库,必须移除内切酶位点的碱基,因为这会引入偏差(bias)。
bismark_methylation_extractor会自动识别 alignment mode,因此双端和单端的文件不需要特别设置
(V) Runningbismark2report
USAGE:bismark2report [options]
(VI) Runningbismark2summary
USAGE:bismark2summary [options]
(VII) Bismark Nucleotide Coverage report (bam2nuc)
那么这些文件都长什么样呢?如何用到下一步的分析里面?
且听下回分解。