Trim_galore自动化过滤数据,十分适合小白,具体来说的话,trim_galore分几步处理数据:
- Quality trimming;
- Adaptor trimming;
- Removing short sequences
- Specialized trimming -hard- and epigenetic clock trimming
1. paired-end
mkdir $data/trimmed_data && cd $data/fastq
echo " trim_galore started at $(date)"
ls *gz |cut -d "_" -f 1-3 |sort -u |while read i; do R1=${i}_1.fq.gz; R2=${i}_2.fq.gz; trim_galore -q 25 --paired $R1 $R2 --phred33 --length 35 -e 0.1 --stringency 3 --gzip -o ../trimmed_data; done
## gzip: 将output进行压缩
echo "trim_galore finished at $(date)"
## length的话,36是最早ATAC的测序长度
2.single-end
ls $data/raw/*gz | while read i
do
nohup trim_galore -q 25 --phred33 --length 25 -e 0.1 --stringency 4 -o $data/clean $i &
done