运行第一个WDL脚本
二代测序数据处理过程中,比对是最为基本的操作之一。下面将向读者展示如何通过WDL脚本运行一个bwa比对作业输出bam比对结果文件。
1. 流程镜像及测试数据获取
- 镜像获取
极道为用户测试提供了相应的测试数据和Docker镜像资源。用户在Linux Shell 界面下,可通过以下命令获取测试所需镜像:
在镜像获取前,需要对Docker 源进行配置,运行以下命令,进行源更新:
配置后,可以进行镜像拉取,执行以下命令:wd@there:~$ sudo echo '{ "insecure-registries":["107.150.123.203:8280"] }' > /etc/docker/daemon.json wd@there:~$ sudo systemctl restart docker
wd@there:~$ sudo docker pull achelous.org:8280/library/bwa:0.7.16a
- 测试数据获取
用户可以通过以下命令获取测试数据
wd@there:~$ cd /mnt/vol1/
wd@there:~$ wget -c http://achelous.org/download/demo-dataset.zip
wd@there:~$ unzip demo_dataset.zip
2. 流程添加
下载demo pipeline对应的文件并解压
wd@there:~$ cd ~
wd@there:~$ wget http://achelous.org/download/wdl-demo-pipeline.zip
wd@there:~$ unzip wdl-demo-pipeline.zip
wd@there:~$ cd wdl-demo-pipeline/
wd@there:~$ biocli pipeline add demo.pipeline.json -d src/
demo_pipeline 目录下包含三项内容:
- pipeline.json pipeline 信息文件,其中记录了pipeline的相关信息,其内容为
{
"Name":"FastqtoBam",
"Type" :"WDL",
"Description" : "Simple pipeline for fastq to bam by bwa mem",
"wdl" : {
"WorkflowFile" : "main.wdl"
}
}
- src/toy-pipeline.wdl 中为流程的WDL脚本。
### Achelous Demo pipeline ###
### contact us e-mail: di.wu@xtaotech.com ####
workflow fastqtobam{
File fastq1
File fastq2
File Ref
call bwa_mem { input: fastq1 = fastq1, fastq2 = fastq2, Ref = Ref }
call samtobam { input: sam = bwa_mem.sam }
output {
File bam_file = samtobam.bam
}
}
task bwa_mem{
File fastq1
File fastq2
File Ref
command{
/bio/bwa/bwa mem ${Ref} -t 5 ${fastq1} ${fastq2} > toy.sam
}
output {
File sam = "toy.sam"
}
runtime {docker:"bwa:0.7.16a";cpu:"5";memory:"2G"}
}
task samtobam{
File sam
command {
/bio/samtools/samtools view -bS ${sam} -o toy.bam
}
output {
File bam = "toy.bam"
}
runtime {docker:"bwa:0.7.16a";cpu:"1";memory:"2G"}
}
该流程实现了从二代测序数据文件fastq进过bwa比对和samtools文件格式转化两个步骤。
流程中涉及三个参数,分别为双端测序的
关于如何编写WDL脚本,可参考WDL教程中的说明
- job.json 作业文件
该文件记录作业投递所需要的参数,其内容如下
{
"Name" : "My_first_WDL_job",
"Pipeline" : "FastqtoBam",
"WorkDir" : "",
"InputDataSet" : {
"WorkflowInput" : {
"fastqtobam.fastq1" : "",
"fastqtobam.fastq2" : "",
"fastqtobam.ref" : ""
}
},
"Priority" : 7
}
其中 WorkDir 和 WorkflowInput 中的内容,需要用户进行填写。
3. 作业参数修改
用户可以对 demo.job.json 文件进行编辑。其中WorkDir是流程运行的输出路径,WorkflowInput中三个参数分别对应测试数据集中read1.fastq、read2.fastq和ref.fa的路径。
Achelous系统支持在文件绝对路径进行参数填写。
{
"Name" : "My_first_WDL_job",
"Pipeline" : "FastqtoBam",
"WorkDir" : "/mnt/vol1/my_first_wdl_job",
"InputDataSet" : {
"WorkflowInput" : {
"fastqtobam.fastq1" : "/mnt/vol1/demo_data_set/read1.fastq",
"fastqtobam.fastq2" : "/mnt/vol1/demo_data_set/read2.fastq",
"fastqtobam.ref" : "/mnt/vol1/demo_data_set/ref.fa"
}
},
"Priority" : 7
}
修改完成后保存退出即可
4. 作业投递
job json文件修改完成后,在linux shell中可以通过以下命令进行作业投递
wd@there:~$ biocli job submit demo.job.json
The job added success, job ID is: d4ae1a5b-65e9-4106-4c44-2e44364f799d
此例中,成功投递后会返回对应 job ID 为 d4ae1a5b-65e9-4106-4c44-2e44364f799d
5. 作业状态查看
当用户需要查看作业运行状态时,可运行以下命令
wd@there:~$ biocli job status 4047f22d-d21c-4b84-6078-8a05128a27ce
Status of Job d4ae1a5b-65e9-4106-4c44-2e44364f799d:
Name: My_first_WDL_job
Pipeline: FASTQTOBAM
State: CREATED
Owner: C
WorkDir: /mnt/vol1/my_first_wdl_job
PausedState: N/A
Created: 2021-04-21T18:26:39+08:00
Finished: N/A
RetryLimit: 3
RunCount: 0
UserStageCount: 0
StageQuota: -1
Priority: 7
FailReason:
GraphBuildStatus: Completed
DoneStages: No stage done
RunningStages: No stage running
WaitingStages: No stage waiting
ForbiddenStages: No stage forbidden
作业状态查看,除了使用完整job ID进行查看之外,也可以截取job ID中任意字串进行查询。返回信息中有作业状态,工作路径等信息。
6. 查看作业结果
当程序运行结束后,执行作业状态查看命令,会返回如下信息:
Status of Job d4ae1a5b-65e9-4106-4c44-2e44364f799d:
Name: wd
Pipeline: wes
State: FINISHED
Owner: uec
WorkDir: /mnt/vol1/my_first_wdl_job
PausedState:
Created: 2021-4-21T09:33:26Z
Finished: 2021-4-21T09:35:58Z
RetryLimit: 3
用户可进入结果路径查看对应的
wd@there:~$ cd /mnt/vol1/my_first_wdl_job
wd@there:~$ ls -alR ./
fastqtobam-fastqtobam.bwa_mem fastqtobam-fastqtobam.samtobam logs wdllibiofiles
/autofs/vol6/Bioinformatics-pipeline/demo_results/fastqtobam-fastqtobam.bwa_mem:
toy.sam
/autofs/vol6/Bioinformatics-pipeline/demo_results/fastqtobam-fastqtobam.samtobam:
toy.bam
...