sra数据
sra数据是SRA(Sequence Read Archive)数据库是用于存储二代测序数 据的原始数据的一种压缩格式。这种数据格式不能直接进行处理,需要转换 成fastq才能进行质控以及去adapt等处理。
fastq-dump命令属于sra-tools这个包
(rna) Mar23 22:50:08 ~/Data/rawdata/sra
$ ls
CHECK raw_md5.txt SRR1039511
filereport_read_run_PRJNA229998_tsv.txt sra.url SRR1039512
md5.txt SRR1039510
(rna) Mar23 00:18:50 ~/Data/rawdata/sra
$ ll
total 28
drwxrwxr-x 2 Mar23 Mar23 4096 Apr 10 22:39 ./
drwxrwxr-x 3 Mar23 Mar23 4096 Apr 4 23:19 ../
-rw-rw-r-- 1 Mar23 Mar23 45 Apr 10 22:25 CHECK
lrwxrwxrwx 1 Mar23 Mar23 68 Apr 6 23:23 filereport_read_run_PRJNA229998_tsv.txt -> /teach/t_rna/data/airway/sra/filereport_read_run_PRJNA229998_tsv.txt
-rw-rw-r-- 1 Mar23 Mar23 720 Apr 10 22:30 md5.txt
-rw-rw-r-- 1 Mar23 Mar23 135 Apr 10 22:39 raw_md5.txt
-rw-rw-r-- 1 Mar23 Mar23 816 Apr 6 23:29 sra.url
lrwxrwxrwx 1 Mar23 Mar23 39 Apr 10 21:21 SRR1039510 -> /teach/t_rna/data/airway/sra/SRR1039510
lrwxrwxrwx 1 Mar23 Mar23 39 Apr 10 21:21 SRR1039511 -> /teach/t_rna/data/airway/sra/SRR1039511
lrwxrwxrwx 1 Mar23 Mar23 39 Apr 10 21:21 SRR1039512 -> /teach/t_rna/data/airway/sra/SRR1039512
(rna) Mar23 00:19:01 ~/Data/rawdata/sra
$ fast
fasterq-dump fastq-dump.2
fasterq-dump.2 fastq-dump.2.10.7
fasterq-dump.2.10.7 fastq-dump.2.3.5
fasterq-dump-orig fastq-dump-orig
fasterq-dump-orig.2.10.7 fastq-dump-orig.2.10.7
fastp fastq-load
fastqc fastq-load.2
fastq-dump fastq-load.2.3.5
(rna) Mar23 00:19:01 ~/Data/rawdata/sra
$ fastq
fastqc fastq-dump-orig
fastq-dump fastq-dump-orig.2.10.7
fastq-dump.2 fastq-load
fastq-dump.2.10.7 fastq-load.2
fastq-dump.2.3.5 fastq-load.2.3.5
(rna) Mar23 00:19:01 ~/Data/rawdata/sra
$ fastq-dump -h
Usage: fastq-dump [ options ] [ accessions(s)... ]
Parameters:
accessions(s) list of accessions to process
Options:
-A|--accession <accession> Replaces accession derived from <path> in
filename(s) and deflines (only for
single table dump)
--table <table-name> Table name within cSRA object, default is
"SEQUENCE"
--split-spot Split spots into individual reads
-N|--minSpotId <rowid> Minimum spot id
-X|--maxSpotId <rowid> Maximum spot id
--spot-groups <[list]>[,...] Filter by SPOT_GROUP (member): name[,...]
-W|--clip Remove adapter sequences from reads
-M|--minReadLen <len> Filter by sequence length >= <len>
-R|--read-filter <filter> Split into files by READ_FILTER value
[split], optionally filter by value:
[pass|reject|criteria|redacted]
-E|--qual-filter Filter used in early 1000 Genomes data: no
sequences starting or ending with >= 10N
--qual-filter-1 Filter used in current 1000 Genomes data
--aligned Dump only aligned sequences
--unaligned Dump only unaligned sequences
--aligned-region <name[:from-to]>
Filter by position on genome. Name can
eiter by accession.version (ex:
NC_000001.10) or file specific name (ex:
"chr1" or "1". "from" and "to" are
1-based coordinates
--matepair_distance <from-to|unknown>
Filter by distance between matepairs. Use
"unknown" to find matepairs split
between the references. Use from-to to
limit matepair distance on the same
reference
--skip-technical Dump only biological reads
-O|--outdir <path> Output directory, default is working
directory '.'
-Z|--stdout Output to stdout, all split data become
joined into single stream
--gzip Compress output using gzip: deprecated,
not recommended
--bzip2 Compress output using bzip2: deprecated,
not recommended
--split-files Write reads into separate files. Read
number will be suffixed to the file
name. NOTE! The `--split-3` option is
recommended. In cases where not all
spots have the same number of reads,
this option will produce files that WILL
CAUSE ERRORS in most programs which
process split pair fastq files.
--split-e 3-way splitting for mate-pairs. For each
spot, if there are two biological reads
satisfying filter conditions, the first
is placed in the `*_1.fastq` file, and
the second is placed in the `*_2.fastq`
file. If there is only one biological
read satisfying the filter conditions,
it is placed in the `*.fastq` file.All
other reads in the spot are ignored.
-G|--spot-group Split into files by SPOT_GROUP (member
name)
-T|--group-in-dirs Split into subdirectories instead of files
-K|--keep-empty-files Do not delete empty files
-C|--dumpcs <cskey> Formats sequence using color space
(default for SOLiD), "cskey" may be
specified for translation or else
specify "dflt" to use the default value
-B|--dumpbase Formats sequence using base space (default
for other than SOLiD).
-Q|--offset <integer Offset to use for quality conversion,
default is 33
--fasta <line-width> FASTA only, no qualities, with can be
"default" or "0" for no wrapping
--suppress-qual-for-cskey suppress quality-value for cskey
-F|--origfmt Defline contains only original sequence
name
-I|--readids Append read id after spot id as
'accession.spot.readid' on defline
--helicos Helicos style defline
--defline-seq <fmt> Defline format specification for sequence.
--defline-qual <fmt> Defline format specification for quality.
<fmt> is string of characters and/or
variables. The variables can be one of:
$ac - accession, $si spot id, $sn spot
name, $sg spot group (barcode), $sl spot
length in bases, $ri read number, $rn
read name, $rl read length in bases.
'[]' could be used for an optional
output: if all vars in [] yield empty
values whole group is not printed. Empty
value is empty string or for numeric
variables. Ex: @$sn[_$rn]/$ri '_$rn' is
omitted if name is empty
--ngc <path> <path> to ngc file
--perm <path> <path> to permission file
--location <location> location in cloud
--cart <path> <path> to cart file
--disable-multithreading disable multithreading
-V|--version Display the version of the program
-L|--log-level <level> Logging level as number or enum string.
One of
(fatal|sys|int|err|warn|info|debug) or
(0-6) Current/default is warn
--option-file file Read more options and parameters from the
file.
-h|--help print this message
"fastq-dump" version 2.10.7
(rna) Mar23 00:23:03 ~/Data/rawdata/sra
$ fastq-dump --gzip --split-3 -X 25000 -O ./ SRR1039510
Read 25000 spots for SRR1039510
Written 25000 spots for SRR1039510
(rna) Mar23 00:24:46 ~/Data/rawdata/sra
$ ls
CHECK SRR1039510
filereport_read_run_PRJNA229998_tsv.txt SRR1039510_1.fastq.gz
md5.txt SRR1039510_2.fastq.gz
raw_md5.txt SRR1039511
sra.url SRR1039512
(rna) Mar23 00:25:27 ~/Data/rawdata/sra
$ zless -S SRR1039510_1.fastq.gz #不换行查看解压的文件
@SRR1039510.1 HWI-ST177:290:C0TECACXX:1:1101:1373:2104 length=63
TGGGAGGCTGAGGCAGGAGAATCACTTAAACCTGGGAGGCAGAGGTTACAGTGAGCCGAGATT
+SRR1039510.1 HWI-ST177:290:C0TECACXX:1:1101:1373:2104 length=63
HJJJIJJJJJJJJIJJJGHHIJIIIIIIJJEHGGIJGIJIJJIJHHHGGFFDFFFDEDDDBDC
@SRR1039510.2 HWI-ST177:290:C0TECACXX:1:1101:1340:2124 length=63
AAAGAAGGCGACAGTGAGAAGGAGTCCGAGAAGAGTGATGGAGACCCAATAGTCGATCCTGAG
+SRR1039510.2 HWI-ST177:290:C0TECACXX:1:1101:1340:2124 length=63
HJJJJJJJJJJJIJIIGIJJJJGJHJJJHHDFFFE@CEEEDDDDDDDDDDDDDDDBDDDDDDD
@SRR1039510.3 HWI-ST177:290:C0TECACXX:1:1101:1273:2183 length=63
CTGCTGGGCCCCAAGGTCCTCCTGGTCCCAGTGGTGAAGAAGGAAAGAGAGGCCCTAATGGGG
+SRR1039510.3 HWI-ST177:290:C0TECACXX:1:1101:1273:2183 length=63
HJJJJJJJJJJJJJJJGIIIJJJJJHIJJJJHIJFHGIJJJJJJJHHHHHFFFDDDEDDDDDD
@SRR1039510.4 HWI-ST177:290:C0TECACXX:1:1101:1562:2147 length=63
CTTGGCTGCAGCCATCCCGCTTAGCCTGCCTCACCCACACCCGTGTGGTACCTTCAGCCCTGG
+SRR1039510.4 HWI-ST177:290:C0TECACXX:1:1101:1562:2147 length=63
HJJJJJJJJJJJJJJJJJIJJJJJJJJJJJJIJHJJIJJJJJHHFFFFEEEEEEEDDDDDDDB
@SRR1039510.5 HWI-ST177:290:C0TECACXX:1:1101:1577:2181 length=63
TGAGACAGGTAATTCAGTATAGTAGATTAATATTTTTAATATATATTTTCCCTTAAGATTTCC
+SRR1039510.5 HWI-ST177:290:C0TECACXX:1:1101:1577:2181 length=63
HIJJJJJJJEHJIJJJJIIIJJIIJJJJJJJJJJJJJJJJJJJJJJJJJEHJGI>FFCBGGGI
@SRR1039510.6 HWI-ST177:290:C0TECACXX:1:1101:1650:2181 length=63
ATTTCTCAGTGTAGAAATCATGTCTTCTTAATTGCTGAACCTTACTGCAAAAACTTGTGATGT
+SRR1039510.6 HWI-ST177:290:C0TECACXX:1:1101:1650:2181 length=63
HJJJJJJJJJJJHIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJIJJJJJJJJHHIJJD
SRR1039510_1.fastq.gz
(rna) Mar23 00:29:32 ~/Data/rawdata/sra
$ rm -rf SRR103951*gz #删掉现在的,不解压到当前目录
(rna) Mar23 00:29:51 ~/Data/rawdata/sra
$ ls
CHECK raw_md5.txt SRR1039511
filereport_read_run_PRJNA229998_tsv.txt sra.url SRR1039512
md5.txt SRR1039510
(rna) Mar23 00:29:58 ~/Data/rawdata/sra
$ cd ..
(rna) Mar23 00:30:11 ~/Data/rawdata
$ ls
sra
(rna) Mar23 00:30:12 ~/Data/rawdata
$ mkdir fq
(rna) Mar23 00:30:41 ~/Data/rawdata
$ ls
fq sra
(rna) Mar23 00:30:42 ~/Data/rawdata
$ cd fq/
(rna) Mar23 00:30:49 ~/Data/rawdata/fq
$ ls
(rna) Mar23 10:46:36 ~/Data/rawdata/fq
$ ln -s /teach/t_rna/data/airway/sra/sample.ID
(rna) Mar23 10:46:58 ~/Data/rawdata/fq
$ ls
sample.ID
(rna) Mar23 10:47:00 ~/Data/rawdata/fq
$ cat sample.ID
SRR1039510
SRR1039511
SRR1039512
(rna) Mar23 20:24:32 ~/Data/rawdata/fq
$ ls /trainee2/Mar23/Data/rawdata/sra/SRR* | while read id
do
echo "fastq-dump --gzip --split-e -X 25000 -O ${fqdir} ${id}"
done >sra2fq.sh
(rna) Mar23 18:48:53 ~/Data/rawdata/fq
$ ls
sample.ID sra2fq.sh
(rna) Mar23 18:48:57 ~/Data/rawdata/fq
$ less sra2fq.sh
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039510
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039510_1.fastq.gz
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039510_2.fastq.gz
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039511
fastq-dump --gzip --split-e -X 25000 -O /trainee2/Mar23/Data/rawdata/fq /trainee2/Mar23/Data/rawdata/sra/SRR1039512
(rna) Mar23 20:39:19 ~
$ nohup sh sra2fq.sh >sra2fq.log & #nohup和$配对运行,表示挂载后台运行
[1] 26338
(rna) Mar23 20:39:58 ~
$ nohup: ignoring input and redirecting stderr to stdout
nosh sra2fq.sh >sra2fq.log #sra2fq.log表示日志文件
[1]+ Done nohup sh sra2fq.sh > sra2fq.log
(rna) Mar23 19:11:38 ~/Data/rawdata/fq
$ jobs
[1]+ Done nohup sh sra2fq.sh > sra2fq.log
(rna) Mar23 19:12:03 ~/Data/rawdata/fq
$ ll
total 8560
drwxrwxr-x 2 Mar23 Mar23 4096 Apr 15 19:11 ./
drwxrwxr-x 4 Mar23 Mar23 4096 Apr 11 00:30 ../
lrwxrwxrwx 1 Mar23 Mar23 38 Apr 11 10:46 sample.ID -> /teach/t_rna/data/airway/sra/sample.ID
-rw-rw-r-- 1 Mar23 Mar23 895 Apr 15 19:11 sra2fq.log
-rw-rw-r-- 1 Mar23 Mar23 602 Apr 15 19:07 sra2fq.sh
-rw-rw-r-- 1 Mar23 Mar23 1441724 Apr 15 19:08 SRR1039510_1.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1441964 Apr 15 19:08 SRR1039510_2.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1446801 Apr 15 19:11 SRR1039511_1.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1423626 Apr 15 19:11 SRR1039511_2.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1481234 Apr 15 19:11 SRR1039512_1.fastq.gz
-rw-rw-r-- 1 Mar23 Mar23 1502072 Apr 15 19:11 SRR1039512_2.fastq.gz
(rna) Mar23 20:40:49 ~/Data/rawdata/fq
$ zless -S SRR1039510_1.fastq.gz | wc -l
100000
注意:默认不加-X参数
数据质控
Fastqc
FastQC软件可以对fastq格式的原始数据进行质量统计,评估测序结果,为下一步修剪过滤提供参考。
(rna) Mar23 20:40:59 ~/Data/rawdata/fq
$ cd ..
(rna) Mar23 20:48:01 ~/Data/rawdata
$ ls
fq sra
(rna) Mar23 20:48:02 ~/Data/rawdata
$ mkdir qc
(rna) Mar23 20:48:09 ~/Data/rawdata
$ ls
fq qc sra
(rna) Mar23 20:54:46 ~/Data/rawdata
$ cd qc/
(rna) Mar23 20:56:02 ~/Data/rawdata/qc
$ ls
(rna) Mar23 20:56:04 ~/Data/rawdata/qc
$ pwd
/trainee2/Mar23/Data/rawdata/qc
(rna) Mar23 20:56:29 ~/Data/rawdata/qc
$ ls /trainee2/Mar23/Data/rawdata/fq/
sample.ID SRR1039510_1.fastq.gz SRR1039511_2.fastq.gz tq.gz
sra2fq.log SRR1039510_2.fastq.gz SRR1039512_1.fastq.gz
sra2fq.sh SRR1039511_1.fastq.gz SRR1039512_2.fastq.gz
(rna) Mar23 20:56:43 ~/Data/rawdata/qc
$ qcdir=/trainee2/Mar23/Data/rawdata/qc/
(rna) Mar23 20:57:04 ~/Data/rawdata/qc
$ fqdir=/trainee2/Mar23/Data/rawdata/fq/
(rna) Mar23 20:57:15 ~/Data/rawdata/qc
$ fastqc -t 6 -o $qcdir $fqdir/SRR1039510_1.fastq.gz
Started analysis of SRR1039510_1.fastq.gz
Approx 5% complete for SRR1039510_1.fastq.gz
Approx 10% complete for SRR1039510_1.fastq.gz
Approx 15% complete for SRR1039510_1.fastq.gz
Approx 20% complete for SRR1039510_1.fastq.gz
Approx 25% complete for SRR1039510_1.fastq.gz
Approx 30% complete for SRR1039510_1.fastq.gz
Approx 35% complete for SRR1039510_1.fastq.gz
Approx 40% complete for SRR1039510_1.fastq.gz
Approx 45% complete for SRR1039510_1.fastq.gz
Approx 50% complete for SRR1039510_1.fastq.gz
Approx 55% complete for SRR1039510_1.fastq.gz
Approx 60% complete for SRR1039510_1.fastq.gz
Approx 65% complete for SRR1039510_1.fastq.gz
Approx 70% complete for SRR1039510_1.fastq.gz
Approx 75% complete for SRR1039510_1.fastq.gz
Approx 80% complete for SRR1039510_1.fastq.gz
Approx 85% complete for SRR1039510_1.fastq.gz
Approx 90% complete for SRR1039510_1.fastq.gz
Approx 95% complete for SRR1039510_1.fastq.gz
Approx 100% complete for SRR1039510_1.fastq.gz
Analysis complete for SRR1039510_1.fastq.gz
(rna) Mar23 21:04:48 ~/Data/rawdata/qc
$ ls
SRR1039510_1_fastqc.html SRR1039510_1_fastqc.zip
(rna) Mar23 21:05:03 ~/Data/rawdata/qc
$ ls $fqdir/SRR*.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039511_1.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039511_2.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039512_1.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039512_2.fastq.gz
(rna) Mar23 21:07:21 ~/Data/rawdata/qc
$ fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
Started analysis of SRR1039510_1.fastq.gz
Approx 5% complete for SRR1039510_1.fastq.gz
Approx 10% complete for SRR1039510_1.fastq.gz
Approx 15% complete for SRR1039510_1.fastq.gz
Approx 20% complete for SRR1039510_1.fastq.gz
Approx 25% complete for SRR1039510_1.fastq.gz
Approx 30% complete for SRR1039510_1.fastq.gz
Approx 35% complete for SRR1039510_1.fastq.gz
Approx 40% complete for SRR1039510_1.fastq.gz
Approx 45% complete for SRR1039510_1.fastq.gz
Approx 50% complete for SRR1039510_1.fastq.gz
Approx 55% complete for SRR1039510_1.fastq.gz
Approx 60% complete for SRR1039510_1.fastq.gz
Approx 65% complete for SRR1039510_1.fastq.gz
Approx 70% complete for SRR1039510_1.fastq.gz
Approx 75% complete for SRR1039510_1.fastq.gz
Approx 80% complete for SRR1039510_1.fastq.gz
Approx 85% complete for SRR1039510_1.fastq.gz
Approx 90% complete for SRR1039510_1.fastq.gz
Approx 95% complete for SRR1039510_1.fastq.gz
Approx 100% complete for SRR1039510_1.fastq.gz
Analysis complete for SRR1039510_1.fastq.gz
Started analysis of SRR1039510_2.fastq.gz
Approx 5% complete for SRR1039510_2.fastq.gz
Approx 10% complete for SRR1039510_2.fastq.gz
Approx 15% complete for SRR1039510_2.fastq.gz
Approx 20% complete for SRR1039510_2.fastq.gz
Approx 25% complete for SRR1039510_2.fastq.gz
Approx 30% complete for SRR1039510_2.fastq.gz
Approx 35% complete for SRR1039510_2.fastq.gz
Approx 40% complete for SRR1039510_2.fastq.gz
Approx 45% complete for SRR1039510_2.fastq.gz
Approx 50% complete for SRR1039510_2.fastq.gz
Approx 55% complete for SRR1039510_2.fastq.gz
Approx 60% complete for SRR1039510_2.fastq.gz
Approx 65% complete for SRR1039510_2.fastq.gz
Approx 70% complete for SRR1039510_2.fastq.gz
Approx 75% complete for SRR1039510_2.fastq.gz
Approx 80% complete for SRR1039510_2.fastq.gz
Approx 85% complete for SRR1039510_2.fastq.gz
Approx 90% complete for SRR1039510_2.fastq.gz
Approx 95% complete for SRR1039510_2.fastq.gz
Approx 100% complete for SRR1039510_2.fastq.gz
Analysis complete for SRR1039510_2.fastq.gz
Started analysis of SRR1039511_1.fastq.gz
Approx 5% complete for SRR1039511_1.fastq.gz
Approx 10% complete for SRR1039511_1.fastq.gz
Approx 15% complete for SRR1039511_1.fastq.gz
Approx 20% complete for SRR1039511_1.fastq.gz
Approx 25% complete for SRR1039511_1.fastq.gz
Approx 30% complete for SRR1039511_1.fastq.gz
Approx 35% complete for SRR1039511_1.fastq.gz
Approx 40% complete for SRR1039511_1.fastq.gz
Approx 45% complete for SRR1039511_1.fastq.gz
Approx 50% complete for SRR1039511_1.fastq.gz
Approx 55% complete for SRR1039511_1.fastq.gz
Approx 60% complete for SRR1039511_1.fastq.gz
Approx 65% complete for SRR1039511_1.fastq.gz
Approx 70% complete for SRR1039511_1.fastq.gz
Approx 75% complete for SRR1039511_1.fastq.gz
Approx 80% complete for SRR1039511_1.fastq.gz
Approx 85% complete for SRR1039511_1.fastq.gz
Approx 90% complete for SRR1039511_1.fastq.gz
Approx 95% complete for SRR1039511_1.fastq.gz
Approx 100% complete for SRR1039511_1.fastq.gz
Analysis complete for SRR1039511_1.fastq.gz
Started analysis of SRR1039511_2.fastq.gz
Approx 5% complete for SRR1039511_2.fastq.gz
Approx 10% complete for SRR1039511_2.fastq.gz
Approx 15% complete for SRR1039511_2.fastq.gz
Approx 20% complete for SRR1039511_2.fastq.gz
Approx 25% complete for SRR1039511_2.fastq.gz
Approx 30% complete for SRR1039511_2.fastq.gz
Approx 35% complete for SRR1039511_2.fastq.gz
Approx 40% complete for SRR1039511_2.fastq.gz
Approx 45% complete for SRR1039511_2.fastq.gz
Approx 50% complete for SRR1039511_2.fastq.gz
Approx 55% complete for SRR1039511_2.fastq.gz
Approx 60% complete for SRR1039511_2.fastq.gz
Approx 65% complete for SRR1039511_2.fastq.gz
Approx 70% complete for SRR1039511_2.fastq.gz
Approx 75% complete for SRR1039511_2.fastq.gz
Approx 80% complete for SRR1039511_2.fastq.gz
Approx 85% complete for SRR1039511_2.fastq.gz
Approx 90% complete for SRR1039511_2.fastq.gz
Approx 95% complete for SRR1039511_2.fastq.gz
Approx 100% complete for SRR1039511_2.fastq.gz
Analysis complete for SRR1039511_2.fastq.gz
Started analysis of SRR1039512_1.fastq.gz
Approx 5% complete for SRR1039512_1.fastq.gz
Approx 10% complete for SRR1039512_1.fastq.gz
Approx 15% complete for SRR1039512_1.fastq.gz
Approx 20% complete for SRR1039512_1.fastq.gz
Approx 25% complete for SRR1039512_1.fastq.gz
Approx 30% complete for SRR1039512_1.fastq.gz
Approx 35% complete for SRR1039512_1.fastq.gz
Approx 40% complete for SRR1039512_1.fastq.gz
Approx 45% complete for SRR1039512_1.fastq.gz
Approx 50% complete for SRR1039512_1.fastq.gz
Approx 55% complete for SRR1039512_1.fastq.gz
Approx 60% complete for SRR1039512_1.fastq.gz
Approx 65% complete for SRR1039512_1.fastq.gz
Approx 70% complete for SRR1039512_1.fastq.gz
Approx 75% complete for SRR1039512_1.fastq.gz
Approx 80% complete for SRR1039512_1.fastq.gz
Approx 85% complete for SRR1039512_1.fastq.gz
Approx 90% complete for SRR1039512_1.fastq.gz
Approx 95% complete for SRR1039512_1.fastq.gz
Approx 100% complete for SRR1039512_1.fastq.gz
Analysis complete for SRR1039512_1.fastq.gz
Started analysis of SRR1039512_2.fastq.gz
Approx 5% complete for SRR1039512_2.fastq.gz
Approx 10% complete for SRR1039512_2.fastq.gz
Approx 15% complete for SRR1039512_2.fastq.gz
Approx 20% complete for SRR1039512_2.fastq.gz
Approx 25% complete for SRR1039512_2.fastq.gz
Approx 30% complete for SRR1039512_2.fastq.gz
Approx 35% complete for SRR1039512_2.fastq.gz
Approx 40% complete for SRR1039512_2.fastq.gz
Approx 45% complete for SRR1039512_2.fastq.gz
Approx 50% complete for SRR1039512_2.fastq.gz
Approx 55% complete for SRR1039512_2.fastq.gz
Approx 60% complete for SRR1039512_2.fastq.gz
Approx 65% complete for SRR1039512_2.fastq.gz
Approx 70% complete for SRR1039512_2.fastq.gz
Approx 75% complete for SRR1039512_2.fastq.gz
Approx 80% complete for SRR1039512_2.fastq.gz
Approx 85% complete for SRR1039512_2.fastq.gz
Approx 90% complete for SRR1039512_2.fastq.gz
Approx 95% complete for SRR1039512_2.fastq.gz
Approx 100% complete for SRR1039512_2.fastq.gz
Analysis complete for SRR1039512_2.fastq.gz
(rna) Mar23 21:08:22 ~/Data/rawdata/qc
$ ls
SRR1039510_1_fastqc.html SRR1039511_2_fastqc.html
SRR1039510_1_fastqc.zip SRR1039511_2_fastqc.zip
SRR1039510_2_fastqc.html SRR1039512_1_fastqc.html
SRR1039510_2_fastqc.zip SRR1039512_1_fastqc.zip
SRR1039511_1_fastqc.html SRR1039512_2_fastqc.html
SRR1039511_1_fastqc.zip SRR1039512_2_fastqc.zip
(rna) Mar23 21:17:55 ~/Data/rawdata/qc
$ vim qc.sh
qcdir=/trainee2/Mar23/Data/rawdata/qc/
fqdir=/trainee2/Mar23/Data/rawdata/fq/
fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
touch finished.ok
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
"qc.sh" 5L, 141C 1,1 All
(rna) Mar23 21:16:49 ~/Data/rawdata/qc
$ nohup sh qc.sh >qc.log &
[1] 32657
(rna) Mar23 21:17:45 ~/Data/rawdata/qc
$ nohup: ignoring input and redirecting stderr to stdout
[1]+ Done nohup sh qc.sh > qc.log
(rna) Mar23 21:17:53 ~/Data/rawdata/qc
$ ls
finished.ok SRR1039511_1_fastqc.zip
qc.log SRR1039511_2_fastqc.html
qc.sh SRR1039511_2_fastqc.zip
SRR1039510_1_fastqc.html SRR1039512_1_fastqc.html
SRR1039510_1_fastqc.zip SRR1039512_1_fastqc.zip
SRR1039510_2_fastqc.html SRR1039512_2_fastqc.html
SRR1039510_2_fastqc.zip SRR1039512_2_fastqc.zip
SRR1039511_1_fastqc.html
(rna) Mar23 21:20:52 ~/Data/rawdata/qc
$ ll
total 5580
drwxrwxr-x 2 Mar23 Mar23 4096 Apr 15 21:20 ./
drwxrwxr-x 5 Mar23 Mar23 4096 Apr 15 20:50 ../
-rw-rw-r-- 1 Mar23 Mar23 0 Apr 15 21:17 finished.ok
-rw-rw-r-- 1 Mar23 Mar23 6036 Apr 15 21:17 qc.log
-rw-rw-r-- 1 Mar23 Mar23 141 Apr 15 21:20 qc.sh
-rw-rw-r-- 1 Mar23 Mar23 631202 Apr 15 21:17 SRR1039510_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 314996 Apr 15 21:17 SRR1039510_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 633039 Apr 15 21:17 SRR1039510_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 312600 Apr 15 21:17 SRR1039510_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 634543 Apr 15 21:17 SRR1039511_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 310070 Apr 15 21:17 SRR1039511_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 634839 Apr 15 21:17 SRR1039511_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 311681 Apr 15 21:17 SRR1039511_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 632535 Apr 15 21:17 SRR1039512_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 307280 Apr 15 21:17 SRR1039512_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 634147 Apr 15 21:17 SRR1039512_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 312822 Apr 15 21:17 SRR1039512_2_fastqc.zip
(rna) Mar23 21:21:25 ~/Data/rawdata/qc
$ less qc.log
(rna) Mar23 21:24:32 ~/Data/rawdata/qc
$ cat qc.sh
qcdir=/trainee2/Mar23/Data/rawdata/qc/
fqdir=/trainee2/Mar23/Data/rawdata/fq/
fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
touch finished.ok
(rna) Mar23 21:25:11 ~/Data/rawdata/qc
$ ls
finished.ok SRR1039511_1_fastqc.zip
qc.log SRR1039511_2_fastqc.html
qc.sh SRR1039511_2_fastqc.zip
SRR1039510_1_fastqc.html SRR1039512_1_fastqc.html
SRR1039510_1_fastqc.zip SRR1039512_1_fastqc.zip
SRR1039510_2_fastqc.html SRR1039512_2_fastqc.html
SRR1039510_2_fastqc.zip SRR1039512_2_fastqc.zip
SRR1039511_1_fastqc.html
(rna) Mar23 21:25:29 ~/Data/rawdata/qc
$ multiqc *.zip
[WARNING] multiqc : MultiQC Version v1.10.1 now available!
[INFO ] multiqc : This is MultiQC v1.10
[INFO ] multiqc : Template : default
[INFO ] multiqc : Searching : /trainee2/Mar23/Data/rawdata/qc/SRR1039510_1_fastqc.zip
[INFO ] multiqc : Searching : /trainee2/Mar23/Data/rawdata/qc/SRR1039510_2_fastqc.zip
[INFO ] multiqc : Searching : /trainee2/Mar23/Data/rawdata/qc/SRR1039511_1_fastqc.zip
[INFO ] multiqc : Searching : /trainee2/Mar23/Data/rawdata/qc/SRR1039511_2_fastqc.zip
[INFO ] multiqc : Searching : /trainee2/Mar23/Data/rawdata/qc/SRR1039512_1_fastqc.zip
[INFO ] multiqc : Searching : /trainee2/Mar23/Data/rawdata/qc/SRR1039512_2_fastqc.zip
Searching ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 6/6
[INFO ] fastqc : Found 6 reports
[INFO ] multiqc : Compressing plot data
[INFO ] multiqc : Report : multiqc_report.html
[INFO ] multiqc : Data : multiqc_data
[INFO ] multiqc : MultiQC complete
(rna) Mar23 22:39:16 ~/Data/rawdata/qc
$ vim qc.sh
# 定义输入输出文件夹
qcdir=/trainee2/Mar23/Data/rawdata/qc/
fqdir=/trainee2/Mar23/Data/rawdata/fq/
# fastqc analysis
fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
# 整合报告
multiqc *.zip
touch finished.ok
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
-- INSERT -- 1,30-21
(rna) Mar23 22:42:35 ~/Data/rawdata/qc
$ cat qc.sh
# 定义输入输出文件夹
qcdir=/trainee2/Mar23/Data/rawdata/qc/
fqdir=/trainee2/Mar23/Data/rawdata/fq/
# fastqc analysis
fastqc -t 10 -o $qcdir $fqdir/SRR*.fastq.gz
# 整合报告
multiqc *.zip
touch finished.ok
数据过滤
(rna) Mar23 22:57:50 ~/Data
$ cd cleandata/
(rna) Mar23 22:58:17 ~/Data/cleandata
$ mkdir trim_galore
(rna) Mar23 23:00:55 ~/Data/cleandata
$ cd trim_galore/
(rna) Mar23 23:01:34 ~/Data/cleandata/trim_galore
$ zless -S ../../rawdata/fq/SRR1039510_1.fastq.gz | grep 'AGATCGGAAGAGC'
CAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCT
AATCGGGGCTGGAGGCACTTCAGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTG
GATCGAGTATAAAGGGAATTGCCTCCCACCCCTGCCTCTGCCAGATCGGAAGAGCACACGTCT
GGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTG
TGGACACCAAGATCACATGGCCCAATGGCCTGACGCTGGAGATCGGAAGAGCACACGTCTGAA
TCCCTGATGTGAATGTAAACTTGAAAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAG
GACGCGCAGACAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCC
CCTTGGCTCGGGCTCATCGTGCTCCTGGGCAGCTAGATCGGAAGAGCACACGTCTGAACTCCA
CCAGGCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATATCGGATGCCGTCT
GCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTG
CCCAGATCGGAAGAGCACACGTCCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCT
CAGCACAGCCTCTCCTGCGGGCCAGCGTCATCAAGAAAACATCAGATCGGAAGAGCACACGTC
CCAGCAACTTTTTGAAACTAAAGGCGCTTTCCGCCATCACCGCCACTGGCAGATCGGAAGAGC
ACACGTCTGAACTCCAGTCACACAGTGATCTCTATGCCGTCTTCTGCTTGAGATCGGAAGAGC
CTCTCCTGGAGGTTTCCAGTAGCACTACTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCA
ACACGTCTGAACTCCAGTCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCT
TGGACAGGGTTTCTCCGAGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCT
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTT
CACCGTTTTTGTGGTTAGCTCCTTCTTGCCAACCAACCATGAGCTCCCAGATCGGAAGAGCAC
CAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCT
TCAGCTTGCTCATCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTA
CCTGTCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTT
GGACCAGCCACTGTGGCAGATGGGAGCCAAGAGATCGGAAGAGCACACGTCTGAACTCCAGTC
GTGTCGGGGCGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCC
GCACAGAGTGTAGATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTA
GGTGTGGTAGATCCGTGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCG
ACACGTCTGAACTCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTA
CCCAGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTG
(rna) Mar23 23:04:00 ~/Data/cleandata/trim_galore
$ zless -S ../../rawdata/fq/SRR1039510_1.fastq.gz | grep 'AGATCGGAAGAGC'| wc -l
28 # 28个序列有接头
(rna) Mar23 23:07:45 ~/Data/cleandata/trim_galore
$ rawdata=/trainee2/Mar23/Data/rawdata/fq/
(rna) Mar23 23:10:36 ~/Data/cleandata/trim_galore
$ cleandata=/trainee2/Mar23/Data/cleandata/trim_galore/
(rna) Mar23 23:11:04 ~/Data/cleandata/trim_galore
$ trim_galore --phred33 -q 20 --length 36 --stringency 3 --fastqc --paired --max_n 3 -o $cleandata $rawdata/SRR1039510_1.fastq.gz
Multicore support not enabled. Proceeding with single-core trimming.
Path to Cutadapt set as: 'cutadapt' (default)
Cutadapt seems to be working fine (tested command 'cutadapt --version')
Cutadapt version: 3.3
single-core operation.
Output will be written into the directory: /trainee2/Mar23/Data/cleandata/trim_galore/
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 28 AGATCGGAAGAGC 25000 0.11
smallRNA 0 TGGAATTCTCGG 25000 0.00
Nextera 0 CTGTCTCTTATA 25000 0.00
Using Illumina adapter for trimming (count: 28). Second best hit was smallRNA (count: 0)
Writing report to '/trainee2/Mar23/Data/cleandata/trim_galore/SRR1039510_1.fastq.gz_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.6
Cutadapt version: 3.3
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Maximum number of tolerated Ns: 3
Minimum required adapter overlap (stringency): 3 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 36 bp
Running FastQC on the data once trimming has completed
Output file(s) will be GZIP compressed
Cutadapt seems to be fairly up-to-date (version 3.3). Setting -j 1
Writing final adapter and quality trimmed output to SRR1039510_1_trimmed.fq.gz
>>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz <<<
This is cutadapt 3.3 with Python 3.7.0
Command line parameters: -j 1 -e 0.1 -q 20 -O 3 -a AGATCGGAAGAGC /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 0.38 s (15 µs/read; 3.94 M reads/minute).
=== Summary ===
Total reads processed: 25,000
Reads with adapters: 714 (2.9%)
Reads written (passing filters): 25,000 (100.0%)
Total basepairs processed: 1,575,000 bp
Quality-trimmed: 13,073 bp (0.8%)
Total written (filtered): 1,558,267 bp (98.9%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 714 times
No. of allowed errors:
1-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 30.3%
C: 30.0%
G: 26.5%
T: 13.0%
none/other: 0.3%
Overview of removed sequences
length count expect max.err error counts
3 524 390.6 0 524
4 119 97.7 0 119
5 27 24.4 0 27
6 3 6.1 0 3
7 2 1.5 0 2
8 1 0.4 0 1
11 2 0.0 1 2
12 3 0.0 1 2 1
13 2 0.0 1 2
15 1 0.0 1 1
20 1 0.0 1 1
21 2 0.0 1 1 1
23 1 0.0 1 0 1
24 1 0.0 1 1
29 1 0.0 1 1
32 1 0.0 1 1
33 1 0.0 1 1
38 2 0.0 1 2
39 1 0.0 1 1
40 1 0.0 1 1
41 1 0.0 1 1
44 2 0.0 1 2
46 1 0.0 1 1
48 2 0.0 1 2
52 2 0.0 1 1 1
57 1 0.0 1 1
58 2 0.0 1 2
60 2 0.0 1 2
62 3 0.0 1 2 1
63 2 0.0 1 1 1
RUN STATISTICS FOR INPUT FILE: /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
=============================================
25000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)
Writing report to '/trainee2/Mar23/Data/cleandata/trim_galore/SRR1039510_2.fastq.gz_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: /trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz
Trimming mode: paired-end
Trim Galore version: 0.6.6
Cutadapt version: 3.3
Number of cores used for trimming: 1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Maximum number of tolerated Ns: 3
Minimum required adapter overlap (stringency): 3 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 36 bp
Running FastQC on the data once trimming has completed
Output file(s) will be GZIP compressed
Cutadapt seems to be fairly up-to-date (version 3.3). Setting -j -j 1
Writing final adapter and quality trimmed output to SRR1039510_2_trimmed.fq.gz
>>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file /trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz <<<
This is cutadapt 3.3 with Python 3.7.0
Command line parameters: -j 1 -e 0.1 -q 20 -O 3 -a AGATCGGAAGAGC /trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz
Processing reads on 1 core in single-end mode ...
Finished in 0.38 s (15 µs/read; 3.92 M reads/minute).
=== Summary ===
Total reads processed: 25,000
Reads with adapters: 699 (2.8%)
Reads written (passing filters): 25,000 (100.0%)
Total basepairs processed: 1,575,000 bp
Quality-trimmed: 25,440 bp (1.6%)
Total written (filtered): 1,545,973 bp (98.2%)
=== Adapter 1 ===
Sequence: AGATCGGAAGAGC; Type: regular 3'; Length: 13; Trimmed: 699 times
No. of allowed errors:
1-9 bp: 0; 10-13 bp: 1
Bases preceding removed adapters:
A: 30.8%
C: 32.5%
G: 22.9%
T: 13.6%
none/other: 0.3%
Overview of removed sequences
length count expect max.err error counts
3 507 390.6 0 507
4 126 97.7 0 126
5 28 24.4 0 28
6 1 6.1 0 1
7 1 1.5 0 1
10 1 0.0 1 1
11 3 0.0 1 2 1
13 1 0.0 1 1
15 1 0.0 1 1
20 1 0.0 1 1
21 2 0.0 1 2
23 1 0.0 1 0 1
24 1 0.0 1 1
25 1 0.0 1 1
29 2 0.0 1 2
33 1 0.0 1 1
38 1 0.0 1 1
40 1 0.0 1 1
43 1 0.0 1 1
44 2 0.0 1 2
46 1 0.0 1 1
48 2 0.0 1 2
50 1 0.0 1 1
51 1 0.0 1 1
52 1 0.0 1 1
57 1 0.0 1 1
60 4 0.0 1 4
62 3 0.0 1 3
63 2 0.0 1 1 1
RUN STATISTICS FOR INPUT FILE: /trainee2/Mar23/Data/rawdata/fq//SRR1039510_2.fastq.gz
=============================================
25000 sequences processed in total
The length threshold of paired-end sequences gets evaluated later on (in the validation step)
Validate paired-end files SRR1039510_1_trimmed.fq.gz and SRR1039510_2_trimmed.fq.gz
file_1: SRR1039510_1_trimmed.fq.gz, file_2: SRR1039510_2_trimmed.fq.gz
>>>>> Now validing the length of the 2 paired-end infiles: SRR1039510_1_trimmed.fq.gz and SRR1039510_2_trimmed.fq.gz <<<<<
Writing validated paired-end Read 1 reads to SRR1039510_1_val_1.fq.gz
Writing validated paired-end Read 2 reads to SRR1039510_2_val_2.fq.gz
Total number of sequences analysed: 25000
Number of sequence pairs removed because at least one read was shorter than the length cutoff (36 bp): 626 (2.50%)
Number of sequence pairs removed because at least one read contained more N(s) than the specified limit of 3: 89 (0.36%)
>>> Now running FastQC on the validated data SRR1039510_1_val_1.fq.gz<<<
Started analysis of SRR1039510_1_val_1.fq.gz
Approx 5% complete for SRR1039510_1_val_1.fq.gz
Approx 10% complete for SRR1039510_1_val_1.fq.gz
Approx 15% complete for SRR1039510_1_val_1.fq.gz
Approx 20% complete for SRR1039510_1_val_1.fq.gz
Approx 25% complete for SRR1039510_1_val_1.fq.gz
Approx 30% complete for SRR1039510_1_val_1.fq.gz
Approx 35% complete for SRR1039510_1_val_1.fq.gz
Approx 40% complete for SRR1039510_1_val_1.fq.gz
Approx 45% complete for SRR1039510_1_val_1.fq.gz
Approx 50% complete for SRR1039510_1_val_1.fq.gz
Approx 55% complete for SRR1039510_1_val_1.fq.gz
Approx 60% complete for SRR1039510_1_val_1.fq.gz
Approx 65% complete for SRR1039510_1_val_1.fq.gz
Approx 70% complete for SRR1039510_1_val_1.fq.gz
Approx 75% complete for SRR1039510_1_val_1.fq.gz
Approx 80% complete for SRR1039510_1_val_1.fq.gz
Approx 85% complete for SRR1039510_1_val_1.fq.gz
Approx 90% complete for SRR1039510_1_val_1.fq.gz
Approx 95% complete for SRR1039510_1_val_1.fq.gz
Analysis complete for SRR1039510_1_val_1.fq.gz
>>> Now running FastQC on the validated data SRR1039510_2_val_2.fq.gz<<<
Started analysis of SRR1039510_2_val_2.fq.gz
Approx 5% complete for SRR1039510_2_val_2.fq.gz
Approx 10% complete for SRR1039510_2_val_2.fq.gz
Approx 15% complete for SRR1039510_2_val_2.fq.gz
Approx 20% complete for SRR1039510_2_val_2.fq.gz
Approx 25% complete for SRR1039510_2_val_2.fq.gz
Approx 30% complete for SRR1039510_2_val_2.fq.gz
Approx 35% complete for SRR1039510_2_val_2.fq.gz
Approx 40% complete for SRR1039510_2_val_2.fq.gz
Approx 45% complete for SRR1039510_2_val_2.fq.gz
Approx 50% complete for SRR1039510_2_val_2.fq.gz
Approx 55% complete for SRR1039510_2_val_2.fq.gz
Approx 60% complete for SRR1039510_2_val_2.fq.gz
Approx 65% complete for SRR1039510_2_val_2.fq.gz
Approx 70% complete for SRR1039510_2_val_2.fq.gz
Approx 75% complete for SRR1039510_2_val_2.fq.gz
Approx 80% complete for SRR1039510_2_val_2.fq.gz
Approx 85% complete for SRR1039510_2_val_2.fq.gz
Approx 90% complete for SRR1039510_2_val_2.fq.gz
Approx 95% complete for SRR1039510_2_val_2.fq.gz
Analysis complete for SRR1039510_2_val_2.fq.gz
Deleting both intermediate output files SRR1039510_1_trimmed.fq.gz and SRR1039510_2_trimmed.fq.gz
====================================================================================================
(rna) Mar23 23:11:54 ~/Data/cleandata/trim_galore
$ ls
SRR1039510_1.fastq.gz_trimming_report.txt
SRR1039510_1_val_1_fastqc.html
SRR1039510_1_val_1_fastqc.zip
SRR1039510_1_val_1.fq.gz
SRR1039510_2.fastq.gz_trimming_report.txt
SRR1039510_2_val_2_fastqc.html
SRR1039510_2_val_2_fastqc.zip
SRR1039510_2_val_2.fq.gz
(rna) Mar23 23:12:56 ~/Data/cleandata/trim_galore
$ ll
total 4400
drwxrwxr-x 2 Mar23 Mar23 4096 Apr 15 23:11 ./
drwxrwxr-x 4 Mar23 Mar23 4096 Apr 15 22:59 ../
-rw-rw-r-- 1 Mar23 Mar23 2235 Apr 15 23:11 SRR1039510_1.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23 641928 Apr 15 23:11 SRR1039510_1_val_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 316209 Apr 15 23:11 SRR1039510_1_val_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1274714 Apr 15 23:11 SRR1039510_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 2536 Apr 15 23:11 SRR1039510_2.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23 648744 Apr 15 23:11 SRR1039510_2_val_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 314805 Apr 15 23:11 SRR1039510_2_val_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1279918 Apr 15 23:11 SRR1039510_2_val_2.fq.gz
(rna) Mar23 23:28:53 ~/Data/cleandata/trim_galore
$ cat /trainee2/Mar23/Data/rawdata/fq/sample.ID | while read id
> do
> echo "trim_galore --phred33 -q 20 --length 36 --stringency 3 --fastqc --paired --max_n 3 -o ${cleandata} ${rawdata}/${id}_1.fastq.gz ${rawdata}/${id}_2.fastq.gz"
> done >trim_galore.sh
(rna) Mar23 23:29:27 ~/Data/cleandata/trim_galore
$ ls
SRR1039510_1.fastq.gz_trimming_report.txt
SRR1039510_1_val_1_fastqc.html
SRR1039510_1_val_1_fastqc.zip
SRR1039510_1_val_1.fq.gz
SRR1039510_2.fastq.gz_trimming_report.txt
SRR1039510_2_val_2_fastqc.html
SRR1039510_2_val_2_fastqc.zip
SRR1039510_2_val_2.fq.gz
trim_galore.sh
(rna) Mar23 23:29:50 ~/Data/cleandata/trim_galore
$ less trim_galore.sh
(rna) Mar23 23:30:41 ~/Data/cleandata/trim_galore
$ ls /trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
/trainee2/Mar23/Data/rawdata/fq//SRR1039510_1.fastq.gz
(rna) Mar23 23:30:45 ~/Data/cleandata/trim_galore
$ nohup sh trim_galore.sh >trim_galore.log &
[1] 21186
(rna) Mar23 23:30:55 ~/Data/cleandata/trim_galore
$ nohup: ignoring input and redirecting stderr to stdout
ll
total 9856
drwxrwxr-x 2 Mar23 Mar23 4096 Apr 15 23:31 ./
drwxrwxr-x 4 Mar23 Mar23 4096 Apr 15 22:59 ../
-rw-rw-r-- 1 Mar23 Mar23 2234 Apr 15 23:30 SRR1039510_1.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23 641928 Apr 15 23:30 SRR1039510_1_val_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 316209 Apr 15 23:30 SRR1039510_1_val_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1274714 Apr 15 23:30 SRR1039510_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 2535 Apr 15 23:30 SRR1039510_2.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23 648744 Apr 15 23:31 SRR1039510_2_val_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 314805 Apr 15 23:31 SRR1039510_2_val_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1279918 Apr 15 23:30 SRR1039510_2_val_2.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 2161 Apr 15 23:31 SRR1039511_1.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23 647884 Apr 15 23:31 SRR1039511_1_val_1_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 317675 Apr 15 23:31 SRR1039511_1_val_1_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1290072 Apr 15 23:31 SRR1039511_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 2494 Apr 15 23:31 SRR1039511_2.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23 646340 Apr 15 23:31 SRR1039511_2_val_2_fastqc.html
-rw-rw-r-- 1 Mar23 Mar23 317650 Apr 15 23:31 SRR1039511_2_val_2_fastqc.zip
-rw-rw-r-- 1 Mar23 Mar23 1269066 Apr 15 23:31 SRR1039511_2_val_2.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 757 Apr 15 23:31 SRR1039512_1.fastq.gz_trimming_report.txt
-rw-rw-r-- 1 Mar23 Mar23 1048576 Apr 15 23:31 SRR1039512_1_trimmed.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 20646 Apr 15 23:31 trim_galore.log
-rw-rw-r-- 1 Mar23 Mar23 720 Apr 15 23:29 trim_galore.sh
(rna) Mar23 23:31:08 ~/Data/cleandata/trim_galore
$ less trim_galore.log
[1]+ Done nohup sh trim_galore.sh > trim_galore.log
(rna) Mar23 23:32:04 ~/Data/cleandata/trim_galore
$ ll *fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1274714 Apr 15 23:30 SRR1039510_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1279918 Apr 15 23:30 SRR1039510_2_val_2.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1290072 Apr 15 23:31 SRR1039511_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1269066 Apr 15 23:31 SRR1039511_2_val_2.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1306804 Apr 15 23:31 SRR1039512_1_val_1.fq.gz
-rw-rw-r-- 1 Mar23 Mar23 1331250 Apr 15 23:31 SRR1039512_2_val_2.fq.g