数据过滤与质控
软件:fastp
目的:对测序序列的质量检查
安装软件
- 安装miniconda, 详见Miniconda的安装
- 安装fastp: conda install fastp
对数据进行过滤与质控
对第一天下载的fastq文件进行操作
- 使用以下命令:
fastp -i SRR2176358_RNA_seq_of_Blondee_fruit_skin_with_fiesh_at_stage_ I_Rep_I.fastq.gz -o SRR2176358_RNA_seq_of_Blondee_fruit_skin_with_fiesh_at_stage_ I_Rep_I.fastq.gz -h SRR2176358_RNA_seq_of_Blondee_fruit_skin_with_fiesh_at_stage_ I_Rep_I.html -j SRR2176358_RNA_seq_of_Blondee_fruit_skin_with_fiesh_at_stage_ I_Rep_I.json
-
但是这样名称太长了以后分析时也不太好使用,
- 所以需要对fastq文件进行批量重命名,利用rename命令:
$ rename 's/SRR.*_RNA-seq_of_//' *.gz
Blondee_fruit_skin_with_fiesh_at_stage_ I_Rep._I.fastq.gz
$ rename 's/_fruit_skin_with_fiesh//' *.gz
Blondee_at_stage_ I_Rep._I.fastq.gz
$ rename 's/at_stage_ IV/S4/' *.gz
Blondee_S4_at_harvest_Rep._I.fastq.gz
$ rename 's/at_stage_ III/S3/' *.gz
Blondee_S3_Rep._I.fastq.gz
$ rename 's/at_stage_ II/S2/' *.gz
Blondee_S2_Rep._I.fastq.gz
$ rename 's/at_stage_ I/S1/' *.gz
Blondee_S1_Rep._I.fastq.gz
$ rename 's/Blondee/BLO/' *.gz
BLO_S1_Rep._I.fastq.gz
$ rename 's/Kidds-D_8/KID/' *.gz
KID_S1_Rep._II.fastq.gz
$ rename 's/at_harvest_//' *.gz
BLO_S4_Rep._I.fastq.gz
$ rename 's/._III/3/' *.gz
BLO_S1_Rep3.fastq.gz
$ rename 's/._II/2/' *.gz
BLO_S1_Rep2.fastq.gz
$ rename 's/._I/1/' *.gz
BLO_S1_Rep1.fastq.gz
经过以上几次的修改最终得到以下文件名:- 制作批量fastp命令行:利用awk命令:
$ ls *.gz > fastq.lst
$ head -3 fastq.lst
BLO_S1_Rep1.fastq.gz
BLO_S1_Rep2.fastq.gz
BLO_S1_Rep3.fastq.gz
$ awk '{print "fastp -i "$1}' fastq.lst
fastp -i BLO_S1_Rep1.fastq.gz
fastp -i BLO_S1_Rep2.fastq.gz
fastp -i BLO_S1_Rep3.fastq.gz
…
$ awk '{print "fastp -i "$1" -o clean_data/"$1}' fastq.lst
fastp -i BLO_S1_Rep1.fastq.gz -o clean_data/BLO_S1_Rep1.fastq.gz
…
$ awk '{print "fastp -i "$1" -o clean_data/"$1" -h "$1".html -j "$1".json &"}' fastq.lst
fastp -i BLO_S1_Rep1.fastq.gz -o clean_data/BLO_S1_Rep1.fastq.gz -h BLO_S1_Rep1.fastq.gz.html -j BLO_S1_Rep1.fastq.gz.json &(如果是在自己的电脑上运行的话可以不用加&,加&的目的是想让所有命令并行运行,自己的电脑脑没有这么多线程)
…
$ awk '{print "fastp -i "$1" -o clean_data/"$1" -h "$1".html -j "$1".json &"}' fastq.lst > run_fastp.sh
$ nohup sh run_fastp.sh &