1. Process per file output
Problem
A task in your workflow produces two or more files at time. A downstream task needs to process each of these files independently.
当一个process生成多个结果时,使用flatten可以在下游process中单独处理每个文件。
Solution
Use the flatten operator to transform the outputs of the upstream process to a channel emitting each file separately. Then use this channel as input for the downstream process.
Code
process foo {
output:
file '*.txt' into foo_ch
script:
''' echo Hello there! > file1.txt echo What a beautiful day > file2.txt echo I wish you are having fun1 > file3.txt '''
}
process bar {
input:
file x from foo_ch.flatten()
script:
""" cat $x """
}
Use the the following command to execute the example:
nextflow run patterns/process-per-file-output.nf
2. Process per file path
Problem
You need to execute a task for each file that matches a glob pattern.
当需要对符合匹配模式的每个文件运行该process时,可以使用Channel.fromPath为每个文件创建一个独立的channel,各个文件并行处理。
Solution
Use the Channel.fromPath method to create a channel emitting all files matching the glob pattern. Then, use the channel as input of the process implementing your task.
Code
Channel.fromPath('reads/*_1.fq.gz').set{ samples_ch }
process foo {
input:
file x from samples_ch
script:
""" your_command --input $x """
}
3. Process per file chunk
Problem
You need to split one or more input files into chunks and execute a task for each of them.
输入一个文件,需要八该文件分割成指定大小的多份小文件,并分别处理这些小文件,可以使用splitText分割,并设置相应的channel。
Solution
Use the the splitText operator to split a file in chunks of a given size. Then use the resulting channel as input for the process implementing your task.
Caveat: By default chunks are kept in memory. When splitting big files specify the parameter file: true to save the chunks into files. See the documentation for details.
Splitter for specific file formats are available, eg splitFasta and splitFastq.
Code
Channel
.fromPath('poem.txt')
.splitText(by: 5)
.set{ chunks_ch }
process foo {
echo true
input:
file x from chunks_ch
script:
""" rev $x | rev """
}
4. Process per file pairs
Problem
You need to process the files into a directory grouping them by pairs.
您需要将文件按对处理到一个目录中。使用Channel.From FilePair方法创建一个通道,发出与GLOB模式匹配的文件对。模式必须匹配配对文件名中的公共前缀。
匹配的文件以元组的形式发出,其中第一个元素是匹配文件的分组键,第二个元素是文件对本身。
Solution
Use the Channel.fromFilePairs method to create a channel emitting the file pairs matching a glob pattern. The pattern must match a common prefix in the paired file names.
The matching files are emitted as tuples in which the first element is the grouping key of the matching files and the second element is the file pair itself.
Code
Channel
.fromFilePairs('reads/*_{1,2}.fq.gz')
.set { samples_ch }
process foo {
input:
set sampleId, file(reads) from samples_ch
script:
""" your_command --sample $sampleId --reads $reads """
}
Custom grouping strategy
When needed it is possible to define a custom grouping strategy. A common use case is for alignment BAM files (sample1.bam) that come along with their index file. The difficulty is that the index is sometimes called sample1.bai and sometimes sample1.bam.bai depending on the software used. The following example can accommodate both cases.
Channel
.fromFilePairs('alignment/*.{bam,bai}') { file -> file.name.replaceAll(/.bam|.bai$/,'') }
.set { samples_ch }
process foo {
input:
set sampleId, file(bam) from samples_ch
script:
""" your_command --sample $sampleId --bam ${sampleId}.bam """
}