2021-03-23 转录组原始测序数据-sra2fastq 数据质控

sra数据

sra数据是SRA(Sequence Read Archive)数据库是用于存储二代测序数 据的原始数据的一种压缩格式。这种数据格式不能直接进行处理,需要转换 成fastq才能进行质控以及去adapt等处理。
fastq-dump命令属于sra-tools这个包

(rna) Mar23 22:50:08 ~/Data/rawdata/sra
$ ls
CHECK                                    raw_md5.txt  SRR1039511
filereport_read_run_PRJNA229998_tsv.txt  sra.url      SRR1039512
md5.txt                                  SRR1039510
(rna) Mar23 00:18:50 ~/Data/rawdata/sra
$ ll
total 28
drwxrwxr-x 2 Mar23 Mar23 4096 Apr 10 22:39 ./
drwxrwxr-x 3 Mar23 Mar23 4096 Apr  4 23:19 ../
-rw-rw-r-- 1 Mar23 Mar23   45 Apr 10 22:25 CHECK
lrwxrwxrwx 1 Mar23 Mar23   68 Apr  6 23:23 filereport_read_run_PRJNA229998_tsv.txt -> /teach/t_rna/data/airway/sra/filereport_read_run_PRJNA229998_tsv.txt
-rw-rw-r-- 1 Mar23 Mar23  720 Apr 10 22:30 md5.txt
-rw-rw-r-- 1 Mar23 Mar23  135 Apr 10 22:39 raw_md5.txt
-rw-rw-r-- 1 Mar23 Mar23  816 Apr  6 23:29 sra.url
lrwxrwxrwx 1 Mar23 Mar23   39 Apr 10 21:21 SRR1039510 -> /teach/t_rna/data/airway/sra/SRR1039510
lrwxrwxrwx 1 Mar23 Mar23   39 Apr 10 21:21 SRR1039511 -> /teach/t_rna/data/airway/sra/SRR1039511
lrwxrwxrwx 1 Mar23 Mar23   39 Apr 10 21:21 SRR1039512 -> /teach/t_rna/data/airway/sra/SRR1039512
(rna) Mar23 00:19:01 ~/Data/rawdata/sra
$ fast
fasterq-dump              fastq-dump.2
fasterq-dump.2            fastq-dump.2.10.7
fasterq-dump.2.10.7       fastq-dump.2.3.5
fasterq-dump-orig         fastq-dump-orig
fasterq-dump-orig.2.10.7  fastq-dump-orig.2.10.7
fastp                     fastq-load
fastqc                    fastq-load.2
fastq-dump                fastq-load.2.3.5
(rna) Mar23 00:19:01 ~/Data/rawdata/sra
$ fastq
fastqc                  fastq-dump-orig
fastq-dump              fastq-dump-orig.2.10.7
fastq-dump.2            fastq-load
fastq-dump.2.10.7       fastq-load.2
fastq-dump.2.3.5        fastq-load.2.3.5
(rna) Mar23 00:19:01 ~/Data/rawdata/sra
$ fastq-dump -h

Usage: fastq-dump [ options ] [ accessions(s)... ]

Parameters:

  accessions(s)                    list of accessions to process


Options:

  -A|--accession <accession>       Replaces accession derived from <path> in
                                     filename(s) and deflines (only for
                                     single table dump)
     --table <table-name>          Table name within cSRA object, default is
                                     "SEQUENCE"
     --split-spot                  Split spots into individual reads
  -N|--minSpotId <rowid>           Minimum spot id
  -X|--maxSpotId <rowid>           Maximum spot id
     --spot-groups <[list]>[,...]  Filter by SPOT_GROUP (member): name[,...]
  -W|--clip                        Remove adapter sequences from reads
  -M|--minReadLen <len>            Filter by sequence length >= <len>
  -R|--read-filter <filter>        Split into files by READ_FILTER value
                                     [split], optionally filter by value:
                                     [pass|reject|criteria|redacted]
  -E|--qual-filter                 Filter used in early 1000 Genomes data: no
                                     sequences starting or ending with >= 10N
     --qual-filter-1               Filter used in current 1000 Genomes data
     --aligned                     Dump only aligned sequences
     --unaligned                   Dump only unaligned sequences
     --aligned-region <name[:from-to]>
                                   Filter by position on genome. Name can
                                     eiter by accession.version (ex:
                                     NC_000001.10) or file specific name (ex:
                                     "chr1" or "1". "from" and "to" are
                                     1-based coordinates
     --matepair_distance <from-to|unknown>
                                   Filter by distance between matepairs. Use
                                     "unknown" to find matepairs split
                                     between the references. Use from-to to
                                     limit matepair distance on the same
                                     reference
     --skip-technical              Dump only biological reads
  -O|--outdir <path>               Output directory, default is working
                                     directory '.'
  -Z|--stdout                      Output to stdout, all split data become
                                     joined into single stream
     --gzip                        Compress output using gzip: deprecated,
                                     not recommended
     --bzip2                       Compress output using bzip2: deprecated,
                                     not recommended
     --split-files                 Write reads into separate files. Read
                                     number will be suffixed to the file
                                     name. NOTE! The `--split-3` option is
                                     recommended. In cases where not all
                                     spots have the same number of reads,
                                     this option will produce files that WILL
                                     CAUSE ERRORS in most programs which
                                     process split pair fastq files.
     --split-e                     3-way splitting for mate-pairs. For each
                                     spot, if there are two biological reads
                                     satisfying filter conditions, the first
                                     is placed in the `*_1.fastq` file, and
                                     the second is placed in the `*_2.fastq`
                                     file. If there is only one biological
                                     read satisfying the filter conditions,
                                     it is placed in the `*.fastq` file.All
                                     other reads in the spot are ignored.
  -G|--spot-group                  Split into files by SPOT_GROUP (member
                                     name)
  -T|--group-in-dirs               Split into subdirectories instead of files
  -K|--keep-empty-files            Do not delete empty files
  -C|--dumpcs <cskey>              Formats sequence using color space
                                     (default for SOLiD), "cskey" may be
                                     specified for translation or else
                                     specify "dflt" to use the default value
  -B|--dumpbase                    Formats sequence using base space (default
                                     for other than SOLiD).
  -Q|--offset <integer             Offset to use for quality conversion,
                                     default is 33
     --fasta <line-width>          FASTA only, no qualities, with can be
                                     "default" or "0" for no wrapping
     --suppress-qual-for-cskey     suppress quality-value for cskey
  -F|--origfmt                     Defline contains only original sequence
                                     name
  -I|--readids                     Append read id after spot id as
                                     'accession.spot.readid' on defline
     --helicos                     Helicos style defline
     --defline-seq <fmt>           Defline format specification for sequence.
     --defline-qual <fmt>          Defline format specification for quality.
                                     <fmt> is string of characters and/or
                                     variables. The variables can be one of:
                                     $ac - accession, $si spot id, $sn spot
                                     name, $sg spot group (barcode), $sl spot
                                     length in bases, $ri read number, $rn
                                     read name, $rl read length in bases.
                                     '[]' could be used for an optional
                                     output: if all vars in [] yield empty
                                     values whole group is not printed. Empty
                                     value is empty string or for numeric
                                     variables. Ex: @$sn[_$rn]/$ri '_$rn' is
                                     omitted if name is empty
     --ngc <path>                  <path> to ngc file
     --perm <path>                 <path> to permission file
     --location <location>         location in cloud
     --cart <path>                 <path> to cart file
     --disable-multithreading      disable multithreading
  -V|--version                     Display the version of the program
  -L|--log-level <level>           Logging level as number or enum string.
                                     One of
                                     (fatal|sys|int|err|warn|info|debug) or
                                     (0-6) Current/default is warn
     --option-file file            Read more options and parameters from the
                                     file.
  -h|--help                        print this message

"fastq-dump" version 2.10.7
(rna) Mar23 00:23:03 ~/Data/rawdata/sra
$ fastq-dump --gzip --split-3 -X 25000 -O ./ SRR1039510
Read 25000 spots for SRR1039510 
Written 25000 spots for SRR1039510
(rna) Mar23 00:24:46 ~/Data/rawdata/sra
$ ls
CHECK                                    SRR1039510
filereport_read_run_PRJNA229998_tsv.txt  SRR1039510_1.fastq.gz
md5.txt                                  SRR1039510_2.fastq.gz
raw_md5.txt                              SRR1039511
sra.url                                  SRR1039512
(rna) Mar23 00:25:27 ~/Data/rawdata/sra
$ zless -S SRR1039510_1.fastq.gz #不换行查看解压的文件
@SRR1039510.1 HWI-ST177:290:C0TECACXX:1:1101:1373:2104 length=63
TGGGAGGCTGAGGCAGGAGAATCACTTAAACCTGGGAGGCAGAGGTTACAGTGAGCCGAGATT
+SRR1039510.1 HWI-ST177:290:C0TECACXX:1:1101:1373:2104 length=63
HJJJIJJJJJJJJIJJJGHHIJIIIIIIJJEHGGIJGIJIJJIJHHHGGFFDFFFDEDDDBDC
@SRR1039510.2 HWI-ST177:290:C0TECACXX:1:1101:1340:2124 length=63
AAAGAAGGCGACAGTGAGAAGGAGTCCGAGAAGAGTGATGGAGACCCAATAGTCGATCCTGAG
+SRR1039510.2 HWI-ST177:290:C0TECACXX:1:1101:1340:2124 length=63
HJJJJJJJJJJJIJIIGIJJJJGJHJJJHHDFFFE@CEEEDDDDDDDDDDDDDDDBDDDDDDD
@SRR1039510.3 HWI-ST177:290:C0TECACXX:1:1101:1273:2183 length=63
CTGCTGGGCCCCAAGGTCCTCCTGGTCCCAGTGGTGAAGAAGGAAAGAGAGGCCCTAATGGGG
+SRR1039510.3 HWI-ST177:290:C0TECACXX:1:1101:1273:2183 length=63
HJJJJJJJJJJJJJJJGIIIJJJJJHIJJJJHIJFHGIJJJJJJJHHHHHFFFDDDEDDDDDD
@SRR1039510.4 HWI-ST177:290:C0TECACXX:1:1101:1562:2147 length=63
CTTGGCTGCAGCCATCCCGCTTAGCCTGCCTCACCCACACCCGTGTGGTACCTTCAGCCCTGG
+SRR1039510.4 HWI-ST177:290:C0TECACXX:1:1101:1562:2147 length=63
HJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJIJHJJIJJJJJHHFFFFEEEEEEEDDDDDDDB
@SRR1039510.5 HWI-ST177:290:C0TECACXX:1:1101:1577:2181 length=63
TGAGACAGGTAATTCAGTATAGTAGATTAATATTTTTAATATATATTTTCCCTTAAGATTTCC
+SRR1039510.5 HWI-ST177:290:C0TECACXX:1:1101:1577:2181 length=63
HIJJJJJJJEHJIJJJJIIIJJIIJJJJJJJJJJJJJJJJJJJJJJJJJEHJGI>FFCBGGGI
@SRR1039510.6 HWI-ST177:290:C0TECACXX:1:1101:1650:2181 length=63
ATTTCTCAGTGTAGAAATCATGTCTTCTTAATTGCTGAACCTTACTGCAAAAACTTGTGATGT
+SRR1039510.6 HWI-ST177:290:C0TECACXX:1:1101:1650:2181 length=63
HJJJJJJJJJJJHIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJHHIJJD
SRR1039510_1.fastq.gz
(rna) Mar23 00:29:32 ~/Data/rawdata/sra
$ rm -rf SRR103951*gz #删掉现在的,不解压到当前目录
(rna) Mar23 00:29:51 ~/Data/rawdata/sra
$ ls
CHECK                                    raw_md5.txt  SRR1039511
filereport_read_run_PRJNA229998_tsv.txt  sra.url      SRR1039512
md5.txt                                  SRR1039510
(rna) Mar23 00:29:58 ~/Data/rawdata/sra
$ cd ..
(rna) Mar23 00:30:11 ~/Data/rawdata
$ ls
sra
(rna) Mar23 00:30:12 ~/Data/rawdata
$ mkdir fq
(rna) Mar23 00:30:41 ~/Data/rawdata
$ ls
fq  sra
(rna) Mar23 00:30:42 ~/Data/rawdata
$ cd fq/
(rna) Mar23 00:30:49 ~/Data/rawdata/fq
$ ls
(rna) Mar23 10:46:36 ~/Data/rawdata/fq
$ ln -s /teach/t_rna/data/airway/sra/sample.ID 
(rna) Mar23 10:46:58 ~/Data/rawdata/fq
$ ls
sample.ID
(rna) Mar23 10:47:00 ~/Data/rawdata/fq
$ cat sample.ID 
SRR1039510
SRR1039511
SRR1039512
(rna) Mar23 20:24:32 ~/Data/rawdata/fq
$ ls /trainee2/Mar23/Data/rawdata/sra/SRR* | while read id
do 
echo "fastq-dump --gzip --split-e -X 25000 -O ${fqdir} ${id}"
done >sra2fq.sh
(rna) Mar23 18:48:53 ~/Data/rawdata/fq
$ ls
sample.ID  sra2fq.sh
(rna) Mar23 18:48:57 ~/Data/rawdata/fq
$ less sra2fq.sh 
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039510
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039510_1.fastq.gz
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039510_2.fastq.gz
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039511
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039512
(rna) Mar23 20:39:19 ~
$ nohup sh sra2fq.sh >sra2fq.log & #nohup和$配对运行,表示挂载后台运行
[1] 26338
(rna) Mar23 20:39:58 ~
$ nohup: ignoring input and redirecting stderr to stdout
nosh sra2fq.sh >sra2fq.log #sra2fq.log表示日志文件
[1]+  Done                    nohup sh sra2fq.sh > sra2fq.log
(rna) Mar23 19:11:38 ~/Data/rawdata/fq
$ jobs
[1]+  Done                    nohup sh sra2fq.sh > sra2fq.log
(rna) Mar23 19:12:03 ~/Data/rawdata/fq
$ ll
total 8560
drwxrwxr-x 2 Mar23 Mar23    4096 Apr 15 19:11 ./
drwxrwxr-x 4 Mar23 Mar23    4096 Apr 11 00:30 ../
lrwxrwxrwx 1 Mar23 Mar23      38 Apr 11 10:46 sample.ID -> /teach/t_rna/data/airway/sra/sample.ID
-rw-rw-r-- 1 Mar23 Mar23     895 Apr 15 19:11 sra2fq.log
-rw-rw-r-- 1 Mar23 Mar23     602 Apr 15 19:07 sra2fq.sh
-rw-rw-r-- 1 Mar23 Mar23 1441724 Apr 15 19:08 SRR1039510_1.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1441964 Apr 15 19:08 SRR1039510_2.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1446801 Apr 15 19:11 SRR1039511_1.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1423626 Apr 15 19:11 SRR1039511_2.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1481234 Apr 15 19:11 SRR1039512_1.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1502072 Apr 15 19:11 SRR1039512_2.fastq.gz
(rna) Mar23 20:40:49 ~/Data/rawdata/fq
$ zless -S SRR1039510_1.fastq.gz | wc -l
100000

注意:默认不加-X参数

数据质控

Fastqc

FastQC软件可以对fastq格式的原始数据进行质量统计,评估测序结果,为下一步修剪过滤提供参考。

(rna) Mar23 20:40:59 ~/Data/rawdata/fq
$ cd ..
(rna) Mar23 20:48:01 ~/Data/rawdata
$ ls
fq  sra
(rna) Mar23 20:48:02 ~/Data/rawdata
$ mkdir qc
(rna) Mar23 20:48:09 ~/Data/rawdata
$ ls
fq  qc  sra
(rna) Mar23 20:54:46 ~/Data/rawdata
$ cd qc/
(rna) Mar23 20:56:02 ~/Data/rawdata/qc
$ ls
(rna) Mar23 20:56:04 ~/Data/rawdata/qc
$ pwd
/trainee2/Mar23/Data/rawdata/qc
(rna) Mar23 20:56:29 ~/Data/rawdata/qc
$ ls /trainee2/Mar23/Data/rawdata/fq/
sample.ID   SRR1039510_1.fastq.gz  SRR1039511_2.fastq.gz  tq.gz
sra2fq.log  SRR1039510_2.fastq.gz  SRR1039512_1.fastq.gz
sra2fq.sh   SRR1039511_1.fastq.gz  SRR1039512_2.fastq.gz
(rna) Mar23 20:56:43 ~/Data/rawdata/qc
$ qcdir=/trainee2/Mar23/Data/rawdata/qc/
(rna) Mar23 20:57:04 ~/Data/rawdata/qc
$ fqdir=/trainee2/Mar23/Data/rawdata/fq/
(rna) Mar23 20:57:15 ~/Data/rawdata/qc
$ fastqc -t 6 -o $qcdir $fqdir/SRR1039510_1.fastq.gz
Started analysis of SRR1039510_1.fastq.gz
Approx 5% complete for SRR1039510_1.fastq.gz
Approx 10% complete for SRR1039510_1.fastq.gz
Approx 15% complete for SRR1039510_1.fastq.gz
Approx 20% complete for SRR1039510_1.fastq.gz
Approx 25% complete for SRR1039510_1.fastq.gz
Approx 30% complete for SRR1039510_1.fastq.gz
Approx 35% complete for SRR1039510_1.fastq.gz
Approx 40% complete for SRR1039510_1.fastq.gz
Approx 45% complete for SRR1039510_1.fastq.gz
Approx 50% complete for SRR1039510_1.fastq.gz
Approx 55% complete for SRR1039510_1.fastq.gz
Approx 60% complete for SRR1039510_1.fastq.gz
Approx 65% complete for SRR1039510_1.fastq.gz
Approx 70% complete for SRR1039510_1.fastq.gz
Approx 75% complete for SRR1039510_1.fastq.gz
Approx 80% complete for SRR1039510_1.fastq.gz
Approx 85% complete for SRR1039510_1.fastq.gz
Approx 90% complete for SRR1039510_1.fastq.gz
Approx 95% complete for SRR1039510_1.fastq.gz
Approx 100% complete for SRR1039510_1.fastq.gz
Analysis complete for SRR1039510_1.fastq.gz
(rna) Mar23 21:04:48 ~/Data/rawdata/qc
$ ls
SRR1039510_1_fastqc.html  SRR1039510_1_fastqc.zip
(rna) Mar23 21:05:03 ~/Data/rawdata/qc
$ ls $fqdir/SRR*.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039511_1.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039511_2.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039512_1.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039512_2.fastq.gz
(rna) Mar23 21:07:21 ~/Data/rawdata/qc
$ fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
Started analysis of SRR1039510_1.fastq.gz
Approx 5% complete for SRR1039510_1.fastq.gz
Approx 10% complete for SRR1039510_1.fastq.gz
Approx 15% complete for SRR1039510_1.fastq.gz
Approx 20% complete for SRR1039510_1.fastq.gz
Approx 25% complete for SRR1039510_1.fastq.gz
Approx 30% complete for SRR1039510_1.fastq.gz
Approx 35% complete for SRR1039510_1.fastq.gz
Approx 40% complete for SRR1039510_1.fastq.gz
Approx 45% complete for SRR1039510_1.fastq.gz
Approx 50% complete for SRR1039510_1.fastq.gz
Approx 55% complete for SRR1039510_1.fastq.gz
Approx 60% complete for SRR1039510_1.fastq.gz
Approx 65% complete for SRR1039510_1.fastq.gz
Approx 70% complete for SRR1039510_1.fastq.gz
Approx 75% complete for SRR1039510_1.fastq.gz
Approx 80% complete for SRR1039510_1.fastq.gz
Approx 85% complete for SRR1039510_1.fastq.gz
Approx 90% complete for SRR1039510_1.fastq.gz
Approx 95% complete for SRR1039510_1.fastq.gz
Approx 100% complete for SRR1039510_1.fastq.gz
Analysis complete for SRR1039510_1.fastq.gz
Started analysis of SRR1039510_2.fastq.gz
Approx 5% complete for SRR1039510_2.fastq.gz
Approx 10% complete for SRR1039510_2.fastq.gz
Approx 15% complete for SRR1039510_2.fastq.gz
Approx 20% complete for SRR1039510_2.fastq.gz
Approx 25% complete for SRR1039510_2.fastq.gz
Approx 30% complete for SRR1039510_2.fastq.gz
Approx 35% complete for SRR1039510_2.fastq.gz
Approx 40% complete for SRR1039510_2.fastq.gz
Approx 45% complete for SRR1039510_2.fastq.gz
Approx 50% complete for SRR1039510_2.fastq.gz
Approx 55% complete for SRR1039510_2.fastq.gz
Approx 60% complete for SRR1039510_2.fastq.gz
Approx 65% complete for SRR1039510_2.fastq.gz
Approx 70% complete for SRR1039510_2.fastq.gz
Approx 75% complete for SRR1039510_2.fastq.gz
Approx 80% complete for SRR1039510_2.fastq.gz
Approx 85% complete for SRR1039510_2.fastq.gz
Approx 90% complete for SRR1039510_2.fastq.gz
Approx 95% complete for SRR1039510_2.fastq.gz
Approx 100% complete for SRR1039510_2.fastq.gz
Analysis complete for SRR1039510_2.fastq.gz
Started analysis of SRR1039511_1.fastq.gz
Approx 5% complete for SRR1039511_1.fastq.gz
Approx 10% complete for SRR1039511_1.fastq.gz
Approx 15% complete for SRR1039511_1.fastq.gz
Approx 20% complete for SRR1039511_1.fastq.gz
Approx 25% complete for SRR1039511_1.fastq.gz
Approx 30% complete for SRR1039511_1.fastq.gz
Approx 35% complete for SRR1039511_1.fastq.gz
Approx 40% complete for SRR1039511_1.fastq.gz
Approx 45% complete for SRR1039511_1.fastq.gz
Approx 50% complete for SRR1039511_1.fastq.gz
Approx 55% complete for SRR1039511_1.fastq.gz
Approx 60% complete for SRR1039511_1.fastq.gz
Approx 65% complete for SRR1039511_1.fastq.gz
Approx 70% complete for SRR1039511_1.fastq.gz
Approx 75% complete for SRR1039511_1.fastq.gz
Approx 80% complete for SRR1039511_1.fastq.gz
Approx 85% complete for SRR1039511_1.fastq.gz
Approx 90% complete for SRR1039511_1.fastq.gz
Approx 95% complete for SRR1039511_1.fastq.gz
Approx 100% complete for SRR1039511_1.fastq.gz
Analysis complete for SRR1039511_1.fastq.gz
Started analysis of SRR1039511_2.fastq.gz
Approx 5% complete for SRR1039511_2.fastq.gz
Approx 10% complete for SRR1039511_2.fastq.gz
Approx 15% complete for SRR1039511_2.fastq.gz
Approx 20% complete for SRR1039511_2.fastq.gz
Approx 25% complete for SRR1039511_2.fastq.gz
Approx 30% complete for SRR1039511_2.fastq.gz
Approx 35% complete for SRR1039511_2.fastq.gz
Approx 40% complete for SRR1039511_2.fastq.gz
Approx 45% complete for SRR1039511_2.fastq.gz
Approx 50% complete for SRR1039511_2.fastq.gz
Approx 55% complete for SRR1039511_2.fastq.gz
Approx 60% complete for SRR1039511_2.fastq.gz
Approx 65% complete for SRR1039511_2.fastq.gz
Approx 70% complete for SRR1039511_2.fastq.gz
Approx 75% complete for SRR1039511_2.fastq.gz
Approx 80% complete for SRR1039511_2.fastq.gz
Approx 85% complete for SRR1039511_2.fastq.gz
Approx 90% complete for SRR1039511_2.fastq.gz
Approx 95% complete for SRR1039511_2.fastq.gz
Approx 100% complete for SRR1039511_2.fastq.gz
Analysis complete for SRR1039511_2.fastq.gz
Started analysis of SRR1039512_1.fastq.gz
Approx 5% complete for SRR1039512_1.fastq.gz
Approx 10% complete for SRR1039512_1.fastq.gz
Approx 15% complete for SRR1039512_1.fastq.gz
Approx 20% complete for SRR1039512_1.fastq.gz
Approx 25% complete for SRR1039512_1.fastq.gz
Approx 30% complete for SRR1039512_1.fastq.gz
Approx 35% complete for SRR1039512_1.fastq.gz
Approx 40% complete for SRR1039512_1.fastq.gz
Approx 45% complete for SRR1039512_1.fastq.gz
Approx 50% complete for SRR1039512_1.fastq.gz
Approx 55% complete for SRR1039512_1.fastq.gz
Approx 60% complete for SRR1039512_1.fastq.gz
Approx 65% complete for SRR1039512_1.fastq.gz
Approx 70% complete for SRR1039512_1.fastq.gz
Approx 75% complete for SRR1039512_1.fastq.gz
Approx 80% complete for SRR1039512_1.fastq.gz
Approx 85% complete for SRR1039512_1.fastq.gz
Approx 90% complete for SRR1039512_1.fastq.gz
Approx 95% complete for SRR1039512_1.fastq.gz
Approx 100% complete for SRR1039512_1.fastq.gz
Analysis complete for SRR1039512_1.fastq.gz
Started analysis of SRR1039512_2.fastq.gz
Approx 5% complete for SRR1039512_2.fastq.gz
Approx 10% complete for SRR1039512_2.fastq.gz
Approx 15% complete for SRR1039512_2.fastq.gz
Approx 20% complete for SRR1039512_2.fastq.gz
Approx 25% complete for SRR1039512_2.fastq.gz
Approx 30% complete for SRR1039512_2.fastq.gz
Approx 35% complete for SRR1039512_2.fastq.gz
Approx 40% complete for SRR1039512_2.fastq.gz
Approx 45% complete for SRR1039512_2.fastq.gz
Approx 50% complete for SRR1039512_2.fastq.gz
Approx 55% complete for SRR1039512_2.fastq.gz
Approx 60% complete for SRR1039512_2.fastq.gz
Approx 65% complete for SRR1039512_2.fastq.gz
Approx 70% complete for SRR1039512_2.fastq.gz
Approx 75% complete for SRR1039512_2.fastq.gz
Approx 80% complete for SRR1039512_2.fastq.gz
Approx 85% complete for SRR1039512_2.fastq.gz
Approx 90% complete for SRR1039512_2.fastq.gz
Approx 95% complete for SRR1039512_2.fastq.gz
Approx 100% complete for SRR1039512_2.fastq.gz
Analysis complete for SRR1039512_2.fastq.gz
(rna) Mar23 21:08:22 ~/Data/rawdata/qc
$ ls
SRR1039510_1_fastqc.html  SRR1039511_2_fastqc.html
SRR1039510_1_fastqc.zip   SRR1039511_2_fastqc.zip
SRR1039510_2_fastqc.html  SRR1039512_1_fastqc.html
SRR1039510_2_fastqc.zip   SRR1039512_1_fastqc.zip
SRR1039511_1_fastqc.html  SRR1039512_2_fastqc.html
SRR1039511_1_fastqc.zip   SRR1039512_2_fastqc.zip
(rna) Mar23 21:17:55 ~/Data/rawdata/qc
$ vim qc.sh

qcdir=/trainee2/Mar23/Data/rawdata/qc/
fqdir=/trainee2/Mar23/Data/rawdata/fq/
fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
touch finished.ok

~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
"qc.sh" 5L, 141C                                    1,1           All
(rna) Mar23 21:16:49 ~/Data/rawdata/qc
$ nohup sh qc.sh >qc.log &
[1] 32657
(rna) Mar23 21:17:45 ~/Data/rawdata/qc
$ nohup: ignoring input and redirecting stderr to stdout

[1]+  Done                    nohup sh qc.sh > qc.log
(rna) Mar23 21:17:53 ~/Data/rawdata/qc
$ ls
finished.ok               SRR1039511_1_fastqc.zip
qc.log                    SRR1039511_2_fastqc.html
qc.sh                     SRR1039511_2_fastqc.zip
SRR1039510_1_fastqc.html  SRR1039512_1_fastqc.html
SRR1039510_1_fastqc.zip   SRR1039512_1_fastqc.zip
SRR1039510_2_fastqc.html  SRR1039512_2_fastqc.html
SRR1039510_2_fastqc.zip   SRR1039512_2_fastqc.zip
SRR1039511_1_fastqc.html
(rna) Mar23 21:20:52 ~/Data/rawdata/qc
$ ll
total 5580
drwxrwxr-x 2 Mar23 Mar23   4096 Apr 15 21:20 ./
drwxrwxr-x 5 Mar23 Mar23   4096 Apr 15 20:50 ../
-rw-rw-r-- 1 Mar23 Mar23      0 Apr 15 21:17 finished.ok
-rw-rw-r-- 1 Mar23 Mar23   6036 Apr 15 21:17 qc.log
-rw-rw-r-- 1 Mar23 Mar23    141 Apr 15 21:20 qc.sh
-rw-rw-r-- 1 Mar23 Mar23 631202 Apr 15 21:17 SRR1039510_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 314996 Apr 15 21:17 SRR1039510_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 633039 Apr 15 21:17 SRR1039510_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 312600 Apr 15 21:17 SRR1039510_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 634543 Apr 15 21:17 SRR1039511_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 310070 Apr 15 21:17 SRR1039511_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 634839 Apr 15 21:17 SRR1039511_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 311681 Apr 15 21:17 SRR1039511_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 632535 Apr 15 21:17 SRR1039512_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 307280 Apr 15 21:17 SRR1039512_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 634147 Apr 15 21:17 SRR1039512_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 312822 Apr 15 21:17 SRR1039512_2_fastqc.zip
(rna) Mar23 21:21:25 ~/Data/rawdata/qc
$ less qc.log 
(rna) Mar23 21:24:32 ~/Data/rawdata/qc
$ cat qc.sh 
qcdir=/trainee2/Mar23/Data/rawdata/qc/
fqdir=/trainee2/Mar23/Data/rawdata/fq/
fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
touch finished.ok
(rna) Mar23 21:25:11 ~/Data/rawdata/qc
$ ls
finished.ok               SRR1039511_1_fastqc.zip
qc.log                    SRR1039511_2_fastqc.html
qc.sh                     SRR1039511_2_fastqc.zip
SRR1039510_1_fastqc.html  SRR1039512_1_fastqc.html
SRR1039510_1_fastqc.zip   SRR1039512_1_fastqc.zip
SRR1039510_2_fastqc.html  SRR1039512_2_fastqc.html
SRR1039510_2_fastqc.zip   SRR1039512_2_fastqc.zip
SRR1039511_1_fastqc.html
(rna) Mar23 21:25:29 ~/Data/rawdata/qc
$ multiqc *.zip
[WARNING]         multiqc : MultiQC Version v1.10.1 now available!
[INFO   ]         multiqc : This is MultiQC v1.10
[INFO   ]         multiqc : Template    : default
[INFO   ]         multiqc : Searching   : /trainee2/Mar23/Data/rawdata/qc/SRR1039510_1_fastqc.zip
[INFO   ]         multiqc : Searching   : /trainee2/Mar23/Data/rawdata/qc/SRR1039510_2_fastqc.zip
[INFO   ]         multiqc : Searching   : /trainee2/Mar23/Data/rawdata/qc/SRR1039511_1_fastqc.zip
[INFO   ]         multiqc : Searching   : /trainee2/Mar23/Data/rawdata/qc/SRR1039511_2_fastqc.zip
[INFO   ]         multiqc : Searching   : /trainee2/Mar23/Data/rawdata/qc/SRR1039512_1_fastqc.zip
[INFO   ]         multiqc : Searching   : /trainee2/Mar23/Data/rawdata/qc/SRR1039512_2_fastqc.zip
Searching   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6  
[INFO   ]          fastqc : Found 6 reports
[INFO   ]         multiqc : Compressing plot data
[INFO   ]         multiqc : Report      : multiqc_report.html
[INFO   ]         multiqc : Data        : multiqc_data
[INFO   ]         multiqc : MultiQC complete
(rna) Mar23 22:39:16 ~/Data/rawdata/qc
$ vim qc.sh 
# 定义输入输出文件夹
qcdir=/trainee2/Mar23/Data/rawdata/qc/
fqdir=/trainee2/Mar23/Data/rawdata/fq/
# fastqc analysis
fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
# 整合报告
multiqc *.zip
touch finished.ok

~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
~                                                                     
-- INSERT --                                        1,30-21
(rna) Mar23 22:42:35 ~/Data/rawdata/qc
$ cat qc.sh 
# 定义输入输出文件夹
qcdir=/trainee2/Mar23/Data/rawdata/qc/
fqdir=/trainee2/Mar23/Data/rawdata/fq/
# fastqc analysis
fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
# 整合报告
multiqc *.zip
touch finished.ok

数据过滤

(rna) Mar23 22:57:50 ~/Data
$ cd cleandata/
(rna) Mar23 22:58:17 ~/Data/cleandata
$ mkdir trim_galore
(rna) Mar23 23:00:55 ~/Data/cleandata
$ cd trim_galore/
(rna) Mar23 23:01:34 ~/Data/cleandata/trim_galore
$ zless -S ../../rawdata/fq/SRR1039510_1.fastq.gz | grep 'AGATCGGAAGAGC'
CAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCT
AATCGGGGCTGGAGGCACTTCAGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTG
GATCGAGTATAAAGGGAATTGCCTCCCACCCCTGCCTCTGCCAGATCGGAAGAGCACACGTCT
GGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTG
TGGACACCAAGATCACATGGCCCAATGGCCTGACGCTGGAGATCGGAAGAGCACACGTCTGAA
TCCCTGATGTGAATGTAAACTTGAAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAG
GACGCGCAGACAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCC
CCTTGGCTCGGGCTCATCGTGCTCCTGGGCAGCTAGATCGGAAGAGCACACGTCTGAACTCCA
CCAGGCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATATCGGATGCCGTCT
GCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTG
CCCAGATCGGAAGAGCACACGTCCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCT
CAGCACAGCCTCTCCTGCGGGCCAGCGTCATCAAGAAAACATCAGATCGGAAGAGCACACGTC
CCAGCAACTTTTTGAAACTAAAGGCGCTTTCCGCCATCACCGCCACTGGCAGATCGGAAGAGC
ACACGTCTGAACTCCAGTCACACAGTGATCTCTATGCCGTCTTCTGCTTGAGATCGGAAGAGC
CTCTCCTGGAGGTTTCCAGTAGCACTACTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCA
ACACGTCTGAACTCCAGTCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCT
TGGACAGGGTTTCTCCGAGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCT
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTT
CACCGTTTTTGTGGTTAGCTCCTTCTTGCCAACCAACCATGAGCTCCCAGATCGGAAGAGCAC
CAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCT
TCAGCTTGCTCATCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTA
CCTGTCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTT
GGACCAGCCACTGTGGCAGATGGGAGCCAAGAGATCGGAAGAGCACACGTCTGAACTCCAGTC
GTGTCGGGGCGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCC
GCACAGAGTGTAGATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTA
GGTGTGGTAGATCCGTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCG
ACACGTCTGAACTCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTA
CCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTG
(rna) Mar23 23:04:00 ~/Data/cleandata/trim_galore
$ zless -S ../../rawdata/fq/SRR1039510_1.fastq.gz | grep 'AGATCGGAAGAGC'| wc -l
28 # 28个序列有接头
(rna) Mar23 23:07:45 ~/Data/cleandata/trim_galore
$ rawdata=/trainee2/Mar23/Data/rawdata/fq/
(rna) Mar23 23:10:36 ~/Data/cleandata/trim_galore
$ cleandata=/trainee2/Mar23/Data/cleandata/trim_galore/
(rna) Mar23 23:11:04 ~/Data/cleandata/trim_galore
$ trim_galore --phred33 -q 20 --length 36 --stringency 3 --fastqc --paired --max_n 3 -o $cleandata $rawdata/SRR1039510_1.fastq.gz 
Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: 3.3
single-core operation.
Output will be written into the directory: /trainee2/Mar23/Data/cleandata/trim_galore/


AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz <<)

Found perfect matches for the following adapter sequences:
Adapter type    Count   Sequence    Sequences analysed  Percentage
Illumina    28  AGATCGGAAGAGC   25000   0.11
smallRNA    0   TGGAATTCTCGG    25000   0.00
Nextera 0   CTGTCTCTTATA    25000   0.00
Using Illumina adapter for trimming (count: 28). Second best hit was smallRNA (count: 0)

Writing report to '/trainee2/Mar23/Data/cleandata/trim_galore/SRR1039510_1.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.6
Cutadapt version: 3.3
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Maximum number of tolerated Ns: 3
Minimum required adapter overlap (stringency): 3 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 36 bp
Running FastQC on the data once trimming has completed
Output file(s) will be GZIP compressed

Cutadapt seems to be fairly up-to-date (version 3.3). Setting -j 1
Writing final adapter and quality trimmed output to SRR1039510_1_trimmed.fq.gz


  >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz <<< 
This is cutadapt 3.3 with Python 3.7.0
Command line parameters: -j 1 -e 0.1 -q 20 -O 3 -a AGATCGGAAGAGC /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 0.38 s (15 µs/read; 3.94 M reads/minute).

=== Summary ===

Total reads processed:                  25,000
Reads with adapters:                       714 (2.9%)
Reads written (passing filters):        25,000 (100.0%)

Total basepairs processed:     1,575,000 bp
Quality-trimmed:                  13,073 bp (0.8%)
Total written (filtered):      1,558,267 bp (98.9%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 714 times

No. of allowed errors:
1-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 30.3%
  C: 30.0%
  G: 26.5%
  T: 13.0%
  none/other: 0.3%

Overview of removed sequences
length  count   expect  max.err error counts
3   524 390.6   0   524
4   119 97.7    0   119
5   27  24.4    0   27
6   3   6.1 0   3
7   2   1.5 0   2
8   1   0.4 0   1
11  2   0.0 1   2
12  3   0.0 1   2 1
13  2   0.0 1   2
15  1   0.0 1   1
20  1   0.0 1   1
21  2   0.0 1   1 1
23  1   0.0 1   0 1
24  1   0.0 1   1
29  1   0.0 1   1
32  1   0.0 1   1
33  1   0.0 1   1
38  2   0.0 1   2
39  1   0.0 1   1
40  1   0.0 1   1
41  1   0.0 1   1
44  2   0.0 1   2
46  1   0.0 1   1
48  2   0.0 1   2
52  2   0.0 1   1 1
57  1   0.0 1   1
58  2   0.0 1   2
60  2   0.0 1   2
62  3   0.0 1   2 1
63  2   0.0 1   1 1

RUN STATISTICS FOR INPUT FILE: /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
=============================================
25000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Writing report to '/trainee2/Mar23/Data/cleandata/trim_galore/SRR1039510_2.fastq.gz_trimming_report.txt'

SUMMARISING RUN PARAMETERS
==========================
Input filename: /trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.6
Cutadapt version: 3.3
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Maximum number of tolerated Ns: 3
Minimum required adapter overlap (stringency): 3 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 36 bp
Running FastQC on the data once trimming has completed
Output file(s) will be GZIP compressed

Cutadapt seems to be fairly up-to-date (version 3.3). Setting -j -j 1
Writing final adapter and quality trimmed output to SRR1039510_2_trimmed.fq.gz


  >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz <<< 
This is cutadapt 3.3 with Python 3.7.0
Command line parameters: -j 1 -e 0.1 -q 20 -O 3 -a AGATCGGAAGAGC /trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 0.38 s (15 µs/read; 3.92 M reads/minute).

=== Summary ===

Total reads processed:                  25,000
Reads with adapters:                       699 (2.8%)
Reads written (passing filters):        25,000 (100.0%)

Total basepairs processed:     1,575,000 bp
Quality-trimmed:                  25,440 bp (1.6%)
Total written (filtered):      1,545,973 bp (98.2%)

=== Adapter 1 ===

Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 699 times

No. of allowed errors:
1-9 bp: 0; 10-13 bp: 1

Bases preceding removed adapters:
  A: 30.8%
  C: 32.5%
  G: 22.9%
  T: 13.6%
  none/other: 0.3%

Overview of removed sequences
length  count   expect  max.err error counts
3   507 390.6   0   507
4   126 97.7    0   126
5   28  24.4    0   28
6   1   6.1 0   1
7   1   1.5 0   1
10  1   0.0 1   1
11  3   0.0 1   2 1
13  1   0.0 1   1
15  1   0.0 1   1
20  1   0.0 1   1
21  2   0.0 1   2
23  1   0.0 1   0 1
24  1   0.0 1   1
25  1   0.0 1   1
29  2   0.0 1   2
33  1   0.0 1   1
38  1   0.0 1   1
40  1   0.0 1   1
43  1   0.0 1   1
44  2   0.0 1   2
46  1   0.0 1   1
48  2   0.0 1   2
50  1   0.0 1   1
51  1   0.0 1   1
52  1   0.0 1   1
57  1   0.0 1   1
60  4   0.0 1   4
62  3   0.0 1   3
63  2   0.0 1   1 1

RUN STATISTICS FOR INPUT FILE: /trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz
=============================================
25000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)

Validate paired-end files SRR1039510_1_trimmed.fq.gz and SRR1039510_2_trimmed.fq.gz
file_1: SRR1039510_1_trimmed.fq.gz, file_2: SRR1039510_2_trimmed.fq.gz


>>>>> Now validing the length of the 2 paired-end infiles: SRR1039510_1_trimmed.fq.gz and SRR1039510_2_trimmed.fq.gz <<<<<
Writing validated paired-end Read 1 reads to SRR1039510_1_val_1.fq.gz
Writing validated paired-end Read 2 reads to SRR1039510_2_val_2.fq.gz

Total number of sequences analysed: 25000

Number of sequence pairs removed because at least one read was shorter than the length cutoff (36 bp): 626 (2.50%)
Number of sequence pairs removed because at least one read contained more N(s) than the specified limit of 3: 89 (0.36%)


  >>> Now running FastQC on the validated data SRR1039510_1_val_1.fq.gz<<<

Started analysis of SRR1039510_1_val_1.fq.gz
Approx 5% complete for SRR1039510_1_val_1.fq.gz
Approx 10% complete for SRR1039510_1_val_1.fq.gz
Approx 15% complete for SRR1039510_1_val_1.fq.gz
Approx 20% complete for SRR1039510_1_val_1.fq.gz
Approx 25% complete for SRR1039510_1_val_1.fq.gz
Approx 30% complete for SRR1039510_1_val_1.fq.gz
Approx 35% complete for SRR1039510_1_val_1.fq.gz
Approx 40% complete for SRR1039510_1_val_1.fq.gz
Approx 45% complete for SRR1039510_1_val_1.fq.gz
Approx 50% complete for SRR1039510_1_val_1.fq.gz
Approx 55% complete for SRR1039510_1_val_1.fq.gz
Approx 60% complete for SRR1039510_1_val_1.fq.gz
Approx 65% complete for SRR1039510_1_val_1.fq.gz
Approx 70% complete for SRR1039510_1_val_1.fq.gz
Approx 75% complete for SRR1039510_1_val_1.fq.gz
Approx 80% complete for SRR1039510_1_val_1.fq.gz
Approx 85% complete for SRR1039510_1_val_1.fq.gz
Approx 90% complete for SRR1039510_1_val_1.fq.gz
Approx 95% complete for SRR1039510_1_val_1.fq.gz
Analysis complete for SRR1039510_1_val_1.fq.gz

  >>> Now running FastQC on the validated data SRR1039510_2_val_2.fq.gz<<<

Started analysis of SRR1039510_2_val_2.fq.gz
Approx 5% complete for SRR1039510_2_val_2.fq.gz
Approx 10% complete for SRR1039510_2_val_2.fq.gz
Approx 15% complete for SRR1039510_2_val_2.fq.gz
Approx 20% complete for SRR1039510_2_val_2.fq.gz
Approx 25% complete for SRR1039510_2_val_2.fq.gz
Approx 30% complete for SRR1039510_2_val_2.fq.gz
Approx 35% complete for SRR1039510_2_val_2.fq.gz
Approx 40% complete for SRR1039510_2_val_2.fq.gz
Approx 45% complete for SRR1039510_2_val_2.fq.gz
Approx 50% complete for SRR1039510_2_val_2.fq.gz
Approx 55% complete for SRR1039510_2_val_2.fq.gz
Approx 60% complete for SRR1039510_2_val_2.fq.gz
Approx 65% complete for SRR1039510_2_val_2.fq.gz
Approx 70% complete for SRR1039510_2_val_2.fq.gz
Approx 75% complete for SRR1039510_2_val_2.fq.gz
Approx 80% complete for SRR1039510_2_val_2.fq.gz
Approx 85% complete for SRR1039510_2_val_2.fq.gz
Approx 90% complete for SRR1039510_2_val_2.fq.gz
Approx 95% complete for SRR1039510_2_val_2.fq.gz
Analysis complete for SRR1039510_2_val_2.fq.gz
Deleting both intermediate output files SRR1039510_1_trimmed.fq.gz and SRR1039510_2_trimmed.fq.gz

====================================================================================================

(rna) Mar23 23:11:54 ~/Data/cleandata/trim_galore
$ ls
SRR1039510_1.fastq.gz_trimming_report.txt
SRR1039510_1_val_1_fastqc.html
SRR1039510_1_val_1_fastqc.zip
SRR1039510_1_val_1.fq.gz
SRR1039510_2.fastq.gz_trimming_report.txt
SRR1039510_2_val_2_fastqc.html
SRR1039510_2_val_2_fastqc.zip
SRR1039510_2_val_2.fq.gz
(rna) Mar23 23:12:56 ~/Data/cleandata/trim_galore
$ ll
total 4400
drwxrwxr-x 2 Mar23 Mar23    4096 Apr 15 23:11 ./
drwxrwxr-x 4 Mar23 Mar23    4096 Apr 15 22:59 ../
-rw-rw-r-- 1 Mar23 Mar23    2235 Apr 15 23:11 SRR1039510_1.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23  641928 Apr 15 23:11 SRR1039510_1_val_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23  316209 Apr 15 23:11 SRR1039510_1_val_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1274714 Apr 15 23:11 SRR1039510_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23    2536 Apr 15 23:11 SRR1039510_2.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23  648744 Apr 15 23:11 SRR1039510_2_val_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23  314805 Apr 15 23:11 SRR1039510_2_val_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1279918 Apr 15 23:11 SRR1039510_2_val_2.fq.gz
(rna) Mar23 23:28:53 ~/Data/cleandata/trim_galore
$ cat /trainee2/Mar23/Data/rawdata/fq/sample.ID | while read id
> do
> echo "trim_galore --phred33 -q 20 --length 36 --stringency 3 --fastqc --paired --max_n 3 -o ${cleandata} ${rawdata}/${id}_1.fastq.gz ${rawdata}/${id}_2.fastq.gz"
> done >trim_galore.sh
(rna) Mar23 23:29:27 ~/Data/cleandata/trim_galore
$ ls
SRR1039510_1.fastq.gz_trimming_report.txt
SRR1039510_1_val_1_fastqc.html
SRR1039510_1_val_1_fastqc.zip
SRR1039510_1_val_1.fq.gz
SRR1039510_2.fastq.gz_trimming_report.txt
SRR1039510_2_val_2_fastqc.html
SRR1039510_2_val_2_fastqc.zip
SRR1039510_2_val_2.fq.gz
trim_galore.sh
(rna) Mar23 23:29:50 ~/Data/cleandata/trim_galore
$ less trim_galore.sh
(rna) Mar23 23:30:41 ~/Data/cleandata/trim_galore
$ ls /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
(rna) Mar23 23:30:45 ~/Data/cleandata/trim_galore
$ nohup sh trim_galore.sh >trim_galore.log &
[1] 21186
(rna) Mar23 23:30:55 ~/Data/cleandata/trim_galore
$ nohup: ignoring input and redirecting stderr to stdout
ll
total 9856
drwxrwxr-x 2 Mar23 Mar23    4096 Apr 15 23:31 ./
drwxrwxr-x 4 Mar23 Mar23    4096 Apr 15 22:59 ../
-rw-rw-r-- 1 Mar23 Mar23    2234 Apr 15 23:30 SRR1039510_1.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23  641928 Apr 15 23:30 SRR1039510_1_val_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23  316209 Apr 15 23:30 SRR1039510_1_val_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1274714 Apr 15 23:30 SRR1039510_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23    2535 Apr 15 23:30 SRR1039510_2.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23  648744 Apr 15 23:31 SRR1039510_2_val_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23  314805 Apr 15 23:31 SRR1039510_2_val_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1279918 Apr 15 23:30 SRR1039510_2_val_2.fq.gz
-rw-rw-r-- 1 Mar23 Mar23    2161 Apr 15 23:31 SRR1039511_1.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23  647884 Apr 15 23:31 SRR1039511_1_val_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23  317675 Apr 15 23:31 SRR1039511_1_val_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1290072 Apr 15 23:31 SRR1039511_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23    2494 Apr 15 23:31 SRR1039511_2.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23  646340 Apr 15 23:31 SRR1039511_2_val_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23  317650 Apr 15 23:31 SRR1039511_2_val_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1269066 Apr 15 23:31 SRR1039511_2_val_2.fq.gz
-rw-rw-r-- 1 Mar23 Mar23     757 Apr 15 23:31 SRR1039512_1.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23 1048576 Apr 15 23:31 SRR1039512_1_trimmed.fq.gz
-rw-rw-r-- 1 Mar23 Mar23   20646 Apr 15 23:31 trim_galore.log
-rw-rw-r-- 1 Mar23 Mar23     720 Apr 15 23:29 trim_galore.sh
(rna) Mar23 23:31:08 ~/Data/cleandata/trim_galore
$ less trim_galore.log
[1]+  Done                    nohup sh trim_galore.sh > trim_galore.log
(rna) Mar23 23:32:04 ~/Data/cleandata/trim_galore
$ ll *fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1274714 Apr 15 23:30 SRR1039510_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1279918 Apr 15 23:30 SRR1039510_2_val_2.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1290072 Apr 15 23:31 SRR1039511_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1269066 Apr 15 23:31 SRR1039511_2_val_2.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1306804 Apr 15 23:31 SRR1039512_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1331250 Apr 15 23:31 SRR1039512_2_val_2.fq.g
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,684评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,143评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,214评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,788评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,796评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,665评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,027评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,679评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 41,346评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,664评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,766评论 1 331
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,412评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,015评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,974评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,073评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,501评论 2 343

推荐阅读更多精彩内容