Broad Institute视频笔记Hello World WDL Tutorial

视频地址:这里,这是Broad Institute团队GATK系列视频的最后一讲,主要是给学生演示如何使用WDL运行任务。WDL的教程,GATK团队在网上发了一个教程:WDL tutorial

工作流描述语言(Workflow Description Language, WDL)是一种用可读和可写的语法指定数据处理工作流的方法,由Broad Institute开发。WDL可以直接将几个复杂的任务串在一起,并行执行它们。Cromwell是执行WDL任务的engine。两种联合使用是运行GATK常用的方法。

这里GATK团队提供了练习数据:google drive下载地址:https://drive.google.com/drive/folders/1hxoBIxJ9kF5-dYFgzbCnMRqeXpixqf3k

下载名字为“gatk_bundle_1905”文件夹。

主讲人第一个举的例子是:打印出“hello world”。
我们可以先看一下这个hello_world_0.wdl里是什么样的代码(vim查看):

workflow HelloWorld {
        call WriteGreeting
}
task WriteGreeting {
        command {
                echo "Hello World" #这里输入你要运行的命令
        }
        output {
                File outfile = stdout() #指定输出文件的文件名“stdout”
        }
}

然后运行wdl script:

$ java -jar /gpfs/home/fy04/GATK/gatk_bundle_1905/jars/cromwell-38.jar run /gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/hello_world_0.wdl
#开始运行后会弹出来下面一堆东西
[2020-09-11 12:03:27,36] [info] Running with database db.url = jdbc:hsqldb:mem:8dd6f3e9-d36a-4536-995b-379b3151ffb1;shutdown=false;hsqldb.tx=mvcc
[2020-09-11 12:03:40,06] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
[2020-09-11 12:03:40,26] [info] [RenameWorkflowOptionsInMetadata] 100%
[2020-09-11 12:03:40,77] [info] Running with database db.url = jdbc:hsqldb:mem:f93a480d-fcb9-4166-8b7b-cb87900513fc;shutdown=false;hsqldb.tx=mvcc
[2020-09-11 12:03:41,40] [info] Slf4jLogger started
[2020-09-11 12:03:41,73] [info] Workflow heartbeat configuration:
{
  "cromwellId" : "cromid-cab54e0",
  "heartbeatInterval" : "2 minutes",
  "ttl" : "10 minutes",
  "writeBatchSize" : 10000,
  "writeThreshold" : 10000
}
[2020-09-11 12:03:41,85] [info] Metadata summary refreshing every 2 seconds.
[2020-09-11 12:03:41,95] [warn] 'docker.hash-lookup.gcr-api-queries-per-100-seconds' is being deprecated, use 'docker.hash-lookup.gcr.throttle' instead (see reference.conf)
[2020-09-11 12:03:42,06] [info] CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
[2020-09-11 12:03:42,06] [info] KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
[2020-09-11 12:03:42,06] [info] WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
[2020-09-11 12:03:42,25] [info] JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
[2020-09-11 12:03:46,68] [info] SingleWorkflowRunnerActor: Version 38
[2020-09-11 12:03:46,76] [info] SingleWorkflowRunnerActor: Submitting workflow
[2020-09-11 12:03:46,89] [info] Unspecified type (Unspecified version) workflow a787e0b8-4640-4e62-95a0-51f92b4af76e submitted
[2020-09-11 12:03:46,96] [info] SingleWorkflowRunnerActor: Workflow submitted a787e0b8-4640-4e62-95a0-51f92b4af76e
[2020-09-11 12:03:46,96] [info] 1 new workflows fetched
[2020-09-11 12:03:46,96] [info] WorkflowManagerActor Starting workflow a787e0b8-4640-4e62-95a0-51f92b4af76e
[2020-09-11 12:03:46,97] [info] WorkflowManagerActor Successfully started WorkflowActor-a787e0b8-4640-4e62-95a0-51f92b4af76e
[2020-09-11 12:03:46,97] [info] Retrieved 1 workflows from the WorkflowStoreActor
[2020-09-11 12:03:46,98] [info] WorkflowStoreHeartbeatWriteActor configured to flush with batch size 10000 and process rate 2 minutes.
[2020-09-11 12:03:47,55] [info] Not triggering log of token queue status. Effective log interval = None
[2020-09-11 12:03:48,45] [info] MaterializeWorkflowDescriptorActor [a787e0b8]: Parsing workflow as WDL draft-2
[2020-09-11 12:03:50,05] [info] MaterializeWorkflowDescriptorActor [a787e0b8]: Call-to-Backend assignments: HelloWorld.WriteGreeting -> Local
[2020-09-11 12:03:51,57] [info] WorkflowExecutionActor-a787e0b8-4640-4e62-95a0-51f92b4af76e [a787e0b8]: Starting HelloWorld.WriteGreeting
[2020-09-11 12:03:52,66] [info] Assigned new job execution tokens to the following groups: a787e0b8: 1
[2020-09-11 12:03:55,84] [info] BackgroundConfigAsyncJobExecutionActor [a787e0b8HelloWorld.WriteGreeting:NA:1]: echo "Hello World"
[2020-09-11 12:03:55,87] [info] BackgroundConfigAsyncJobExecutionActor [a787e0b8HelloWorld.WriteGreeting:NA:1]: executing: /bin/bash /gpfs/home/fangy04/GATK/gatk_bundle_1905/hello_world/cromwell-executions/HelloWorld/a787e0b8-4640-4e62-95a0-51f92b4af76e/call-WriteGreeting/execution/script
[2020-09-11 12:03:57,10] [info] BackgroundConfigAsyncJobExecutionActor [a787e0b8HelloWorld.WriteGreeting:NA:1]: job id: 357665
[2020-09-11 12:03:57,10] [info] BackgroundConfigAsyncJobExecutionActor [a787e0b8HelloWorld.WriteGreeting:NA:1]: Status change from - to Done
[2020-09-11 12:03:58,75] [info] WorkflowExecutionActor-a787e0b8-4640-4e62-95a0-51f92b4af76e [a787e0b8]: Workflow HelloWorld complete. Final Outputs:
{
  "HelloWorld.WriteGreeting.outfile": "/gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/cromwell-executions/HelloWorld/a787e0b8-4640-4e62-95a0-51f92b4af76e/call-WriteGreeting/execution/stdout"
}
[2020-09-11 12:03:58,79] [info] WorkflowManagerActor WorkflowActor-a787e0b8-4640-4e62-95a0-51f92b4af76e is in a terminal state: WorkflowSucceededState
#这里你会看到你的输出文件的位置,以及你的这个workflow的ID,这个ID是唯一的
[2020-09-11 12:04:05,76] [info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.
{
  "outputs": {
    "HelloWorld.WriteGreeting.outfile": "/gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/cromwell-executions/HelloWorld/a787e0b8-4640-4e62-95a0-51f92b4af76e/call-WriteGreeting/execution/stdout"
  },
  "id": "a787e0b8-4640-4e62-95a0-51f92b4af76e"
}
[2020-09-11 12:04:07,15] [info] Workflow polling stopped
[2020-09-11 12:04:07,17] [info] Shutting down WorkflowStoreActor - Timeout = 5 seconds
[2020-09-11 12:04:07,17] [info] Shutting down WorkflowLogCopyRouter - Timeout = 5 seconds
[2020-09-11 12:04:07,17] [info] Shutting down JobExecutionTokenDispenser - Timeout = 5 seconds
[2020-09-11 12:04:07,17] [info] JobExecutionTokenDispenser stopped
[2020-09-11 12:04:07,17] [info] Aborting all running workflows.
[2020-09-11 12:04:07,18] [info] WorkflowStoreActor stopped
[2020-09-11 12:04:07,18] [info] WorkflowLogCopyRouter stopped
[2020-09-11 12:04:07,18] [info] Shutting down WorkflowManagerActor - Timeout = 3600 seconds
[2020-09-11 12:04:07,18] [info] WorkflowManagerActor All workflows finished
[2020-09-11 12:04:07,18] [info] WorkflowManagerActor stopped
[2020-09-11 12:04:07,87] [info] Connection pools shut down
[2020-09-11 12:04:07,88] [info] Shutting down SubWorkflowStoreActor - Timeout = 1800 seconds
[2020-09-11 12:04:07,88] [info] Shutting down JobStoreActor - Timeout = 1800 seconds
[2020-09-11 12:04:07,88] [info] Shutting down CallCacheWriteActor - Timeout = 1800 seconds
[2020-09-11 12:04:07,88] [info] Shutting down ServiceRegistryActor - Timeout = 1800 seconds
[2020-09-11 12:04:07,88] [info] Shutting down DockerHashActor - Timeout = 1800 seconds
[2020-09-11 12:04:07,88] [info] Shutting down IoProxy - Timeout = 1800 seconds
[2020-09-11 12:04:07,88] [info] SubWorkflowStoreActor stopped
[2020-09-11 12:04:07,88] [info] CallCacheWriteActor Shutting down: 0 queued messages to process
[2020-09-11 12:04:07,88] [info] JobStoreActor stopped
[2020-09-11 12:04:07,88] [info] KvWriteActor Shutting down: 0 queued messages to process
[2020-09-11 12:04:07,88] [info] WriteMetadataActor Shutting down: 0 queued messages to process
[2020-09-11 12:04:07,88] [info] CallCacheWriteActor stopped
[2020-09-11 12:04:07,88] [info] IoProxy stopped
[2020-09-11 12:04:07,88] [info] ServiceRegistryActor stopped
[2020-09-11 12:04:07,95] [info] DockerHashActor stopped
[2020-09-11 12:04:07,98] [info] Database closed
[2020-09-11 12:04:07,98] [info] Stream materializer shut down
[2020-09-11 12:04:07,98] [info] WDL HTTP import resolver closed

运行后,copy上面outputs的文件位置信息,查看:

$ cd /gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/cromwell-executions/HelloWorld/a787e0b8-4640-4e62-95a0-51f92b4af76e/call-WriteGreeting/execution
$ ll
total 3
-rw------- 1 fy04 fy04    2 Sep 11 12:03 rc
-rw------- 1 fy04 fy04 2463 Sep 11 12:03 script
-rw------- 1 fy04 fy04  437 Sep 11 12:03 script.background
-rw------- 1 fy04 fy04  179 Sep 11 12:03 script.submit
-rw------- 1 fy04 fy04    0 Sep 11 12:03 stderr
-rw------- 1 fy04 fy04    0 Sep 11 12:03 stderr.background
-rw------- 1 fy04 fy04   12 Sep 11 12:03 stdout #这个就是我们运行后的输出文件
-rw------- 1 fy04 fy04   19 Sep 11 12:03 stdout.background
$ cat stdout
Hello World

上面的例子是最最最简单的代码,那么如果在运行的代码里使用变量,应该怎么加?可以打开hello_world_1.wdl文件看一下:

workflow HelloWorld {
        call WriteGreeting
}
task WriteGreeting {
        String greeting #这里你要设置变量,比如变量的名字是greeting,然后设置它的类型,是string字符型
        command {
                echo "${greeting}"
        }
        output {
                File outfile = stdout()
        }
}

然后还需要单独一个json文件,来给参数赋值,在下载的示例文件里,有一个文件是hello_world.inputs.json,打开看一下:

$ cat hello_world.inputs.json
{
  "HelloWorld.WriteGreeting.greeting": "Hello Denmark" #这里要注意的是,你在给变量赋值的时候,前面还要写明这是你的task名字,这里“HelloWorld.WriteGrreting”就是task名字.
}

运行:

$ java -jar /gpfs/home/fy04/GATK/gatk_bundle_1905/jars/cromwell-38.jar run /gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/hello_world_1.wdl -i /gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/hello_world.inputs.json
$ cat stdout
Hello Denmark

上面都是只运行了一个任务,如果你需要把两个,甚至好几个任务都串在一起,使他们并行运行,应该怎么办?

你的WDL script的代码格式:

workflow HelloWorld {
        call WriteGreeting #定义第一个任务名称
        call ReadItBackToMe {
                input:
                        original_greeting = WriteGreeting.out
        }
        output {
                File outfile = ReadItBackToMe.outfile
        }
}
#第一个任务,这个任务运行完会回到上面call的部分,去查看下一个任务是哪个
task WriteGreeting {
        String greeting
        command {
                echo "${greeting}"
        }
        output {
                String out = read_string(stdout())
        }
}
#第二个任务
task ReadItBackToMe {
        String original_greeting
        command {
                echo "${original_greeting} to you too"
        }
        output {
                File outfile = stdout()
        }
}
$ java -jar /gpfs/home/fy04/GATK/gatk_bundle_1905/jars/cromwell-38.jar run /gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/hello_world_2.wdl -i /gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/hello_world.inputs.json
$ cat stdout
Hello Denmark to you too #“Hello Denmark”是第一个任务的输出内容,然后成为第二个任务的输入内容

另外,当你运行WDL script之前,你还可以使用womtool查看一下你的WDL script里有没有简单的语法错误:

$ java -jar /gpfs/home/fy04/GATK/gatk_bundle_1905/jars/womtool-38.jar validate /gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/hello_world_2.wdl
Success! #因为我们的WDL脚本没有问题,所以它显示的是success!

但是如果我把上面的脚本里echo "${greeting}"改成echo "${Sgreeting}",运行上面的查错代码后,就会显示:

$  java -jar /gpfs/home/fy04/GATK/gatk_bundle_1905/jars/womtool-38.jar validate /gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/hello_world_2.wdl
ERROR: Variable Sgreeting does not reference any declaration in the task (line 20, col 11):

                echo "${Sgreeting}"
          ^

Task defined here (line 15, col 6):

task WriteGreeting {
     ^

另外,如果我把脚本里的string greeting改成Int greeting,运行:

$ java -jar /gpfs/home/fy04/GATK/gatk_bundle_1905/jars/womtool-38.jar validate /gpfs/home/fy04/GATK/gatk_bundle_1905/hello_world/hello_world_2.wdl -i hello_world.inputs.json

Failed to evaluate input 'WriteGreeting.greeting' (reason 1 of 1): For input string: "Hello Denmark"
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
禁止转载,如需转载请通过简信或评论联系作者。
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 204,684评论 6 478
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 87,143评论 2 381
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 151,214评论 0 337
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 54,788评论 1 277
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 63,796评论 5 368
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 48,665评论 1 281
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,027评论 3 399
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 36,679评论 0 258
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 41,346评论 1 299
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 35,664评论 2 321
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 37,766评论 1 331
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 33,412评论 4 321
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,015评论 3 307
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 29,974评论 0 19
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,203评论 1 260
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 45,073评论 2 350
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 42,501评论 2 343

推荐阅读更多精彩内容