Skip to content

The GWAS workflow - Optional arguments and defaults

You can easily change any arguments in the GWAS workflow.

Disclaimer

As there is active development of the pipeline for ongoing projects, the arguments stated here might become outdated for the latest versions. We will always strive to make this document up-to-date, but if in doubt, you can always verify the active arguments of a particular version that you are running by looking at the following .config files (while changing the version (v.1.2 e.t.c), for the version that you want to run:

/pgen_int_data_resources/workflows/BRS_tools_GWAS_nf/v.1.2/nextflow.config
/pgen_int_data_resources/workflows/BRS_tools_GWAS_nf/v.1.2/config_profiles/cluster.config
/pgen_int_data_resources/workflows/BRS_tools_GWAS_nf/v.1.2/config_profiles/cloud.config

nextflow.config specifies all major arguments for the pipeline and also additional arguments specific to the HPC cluster or the cloud environment are called with the cluster.config or cloud.config files.

Changing arguments

To change an argument, add the string with double dash and the value in single quotes, e.g.,

--traitType 'quantitative'

To pass an environmental variable, use double quotes for the same effect:

#set variable trait in bash
trait=quantitative

#pass the variable as an argument by adding to the nextflow execution script:
--traitType "$trait"

List of process arguments

gwas_maskingQC

This process sets genotypes that fail certain criteria to missing (./.), performs QC at the VCF level, and transforms the VCF files to bgen and pgen.

It is turned off by default, as it is usually done once per aggregate as it can be very computationally intensive given that it works on VCF files directly.

Argument default Usage
--skip_maskingQC true Set to true to skip masking. Its true (OFF) by default as its very computationally intensive
--min_fmt_dp 10 masking genotype depth for autosomes
--min_fmt_gq 20 masking genotype quality for autosomes
--min_fmt_gq_females 20 masking genotype depth for chrX females
--min_fmt_dp_females 10 masking genotype quality for chrX females
--min_fmt_dp_males 5 masking genotype depth for chrX males
--pvalue_fmt_abratio 0.001 masking AB ratio P-value threshold
vcfQC_additional_args -i \'INFO/OLD_MULTIALLELIC="."\' Additional QC parameters at the VCF level. Argument is passed directly to bcftools as:
bcftools view -i ${params.vcfQC_additional_args}
Default keeps bi-allelic variants only.

vcftopgen

vcftopgen process is activated when input is VCF, but masking is disabled, to create pgen files to pass downstream.

QC at the VCF level is also possible (without having to mask).

Argument default Usage
--vcfQC_additional_args -i \'INFO/OLD_MULTIALLELIC="."\' Additional QC parameters at the VCF level. Argument is passed directly to bcftools as:
bcftools view -i ${params.vcfQC_additional_args}
Default keeps bi-allelic variants only.
--plink_set_missing_var_ids @:#\$r\$a set variant ids when they are missing. Only matters if input is VCF. Corresponds to
--set-missing-var-ids command of plink2.
--plink_new_id_max_allele_len 1000 minimum length of id. Only matters if input is VCF. Corresponds to 
--new-id-max-allele-len command of plink2.
--plink_vcf_half_call m how to deal with vcf half calls. Convert to missing by default. Only matters if input is VCF. Corresponds to 
--vcf-half-call command of plink2.
--plinkmem 2000 memory argument for plink

gwas_siteQC

gwas_siteQC process performs filtering QC at the bgen/pgen/bed level (i.e, files with these formats are generated as necessary. It uses plink2 to make transformations between bgen/pgen/bed formats and plink1.9 to test for differential missingness and HWE.

The process first filters for maf > 0.5% and missingness < 2% by default (change with --siteQCplink_add_args).

In the case of binary traits, test for differential missingness is performed (P>1e-5 by default, option --thres_m) then this is followed by an HWE test that is performed using only unrelated controls (P>1e-6 by default, option --thres_HWE) and uses females only for testing HWE on chromosome X.

In the case of quantitative traits, the process does not perform a differential missingness test, it performs only an HWE test on unrelated individuals of the whole sample and uses females only for testing HWE on chromosome X.

Argument default Usage
--skip_siteQC false Set to true to skip siteQC. Its false by default.
--siteQCplink_add_args --maf 0.005 --geno 0.02 filtering arguments given to plink1.9 for site QC. By default maf>0.005 and missingness < 0.02
--thres_m 0.00001 differential missingness filter P-value
--thres_HWE 0.000001 HWE on unrelated control filter P-value
--plinkmem 2000 memory argument for plink

SAIGE

Arguments for two processes corresponding to the 2-step SAIGE association analysis.

gwas_SAIGE_fit_null_glmm

Argument default Usage
--traitType binary Define trait type as binary or quantitative
--saigeStep1ExtraFlags SAIGE extra flags for controlling step 1
--rdaFile FALSE Use this argument to provide rdaFile directly if pre-generated. Combine with varianceRatioFile. If provided will skip gwas_SAIGE_fit_null_glmm.
--varianceRatioFile FALSE Use this argument to provide varianceRatioFile directly if pre-generated. Combine with rdaFile. If provided will skip gwas_SAIGE_fit_null_glmm.

gwas_SAIGE_spa_tests_bgen

Argument default Usage
--saigeStep2ExtraFlags SAIGE extra flags for controlling step 2

GCTA

Arguments for two processes corresponding to the 3-step GCTA association analysis.

sparseGRM

Argument default Usage
--GRM_sparse_cutoff 0.05 relatedness cut-off for creating sparse-grm

gwas_GCTA_fit_null_glmm

Argument default Usage
--sparseGRM Use this argument to provide precomputed GRM files. If provided, process sparseGRM will be skipped.
Provide the files as string in the following way:
prefix.{grm.id,grm.sp}

Plotting

Argument default Usage
--manhattan_Pcutoff 1e-50 collapse Pvalues above threshold
--manhattan_Pylim 5e-8  y-axis limit (P-value)

List of LSF cluster profile arguments

Arguments that control the cluster submission and error control of the pipeline.

Fitting the null fit for SAIGE and GCTA or inferring the GRM for GCTA sometimes need control of memory and queue and clusterOption arguments to be micromanaged. The arguments below allow this.

Argument default Usage
Memory arguments
--memory 5 GB default memory for all processes unless specified otherwise.
--memory_nullfit 10 GB Memory for processes:
gwas_SAIGE_fit_null_glmm
gwas_GCTA_fit_null_glmm
--memory_grm 80 GB Memory for processes:
sparseGRM (GCTA)
--memory_plotting 10 GB Memory for plotting
Queue arguments
--queue short default queue for all processes unless specified otherwise
--queue_nullfit short Queue for processes:
gwas_SAIGE_fit_null_glmm
gwas_GCTA_fit_null_glmm
queueGRM short queue for processes
gwas_SAIGE_fit_null_glmm
gwas_GCTA_fit_null_glmm
--queue_plotting short Queue for plotting
Cluster options arguments
clusterOptions -P Bio -R rusage[mem=5000] -M 5000 default submission details for all processes unless specified otherwise
--clusterOptions_nullfit submission details for processes:
gwas_SAIGE_fit_null_glmm, gwas_GCTA_fit_null_glmm
--clusterOptions_grm -P Bio -R rusage[mem=80000] -M 80000 submission for processes:
sparseGRM (GCTA)
--clusterOptions_plotting -P Bio -R rusage[mem=10000] -M 10000 submission for process plotting
errorStrategy arguments
--errorStrategy 'finish' default error strategy.
Will terminate pipeline in case of error after retrying specified times
--errorStrategy_GRM 'finish' error strategy for processes:
gwas_SAIGE_fit_null_glmm (SAIGE)
sparseGRM (GCTA)
gwas_GCTA_fit_null_glmm (GCTA)
Will terminate pipeline in case of error after retrying specified times
--errorStrategy_SAIGE_step2 'finish' error strategy for processes:
gwas_SAIGE_spa_tests_bgen (SAIGE)
gwas_GCTA_spa_tests_pgen (GWAS)
Will terminate pipeline in case of error after retrying specified times
--errorStrategy_plotting 'finish' error strategy for process
plotting.
Will terminate pipeline in case of error.
--maxRetries 3 Will terminate pipeline in case of error after retrying specified times