Skip to content

The Aggregrate Variant Testing workflow changelog

Please always use the latest available version, unless explicitly instructed otherwise - older version may still be available in the RE but will not be supported by our team

Release v3.1

New features

This release adds RVtests as an additional method for running rave variant tests. Note that it is implemented following the method described in Nature, so not all the functionality of RVtests is available. In particular, this implementation of RVtests does not use covariates.

Bugfixes

Fixed a bug where, during functional annotation filtering, variant consequence was not being taken into account if you were also filtering on an annotation (e.g. gnomAD frequency) and also allowing the inclusion of variants where that annotation was missing. This lead to more variants passing the filter than expected.

Made improvements to the phenotype file processing, so now phenotype files with multiple blank lines at the end no longer cause workflow issues.

Notes

This version includes new options in the inputs.json file, therefore you will be unable to reuse the inputs.json file from version 3.

Release v3.0

Major update and reworking of the entire pipeline. Please see the v3.x documentation page for an overview of all the new features. The below list is just a few highlights.

New features

Now takes either BGEN or PGEN files as input for genomic data, instead of VCF (annotation input unchanged).

Can now run on any number of phenotypes, as long as all are defined in your phenotype file.

Functional filtering updated to be more flexible, now allows for AND and OR filtering in the same run.

Includes Regenie as an additional program for burden testing.

Release v2.3.1

Bugfixes

Minor updates to the options file.

New features

New functional annotation files (produced using VEP v99 in July 2021) are now used by default.

Release v2.3

Bugfixes

Fixed the options file, so that now a task job that fails while running (transient job failures, as opposed to jobs executing fully but exiting with an error code) will be run again up to 5 times before stopping the workflow.

New features

MIT-style license attached.

Release v2.2

Bugfixes

Fixed a bug in differential missingness checks when processing indels.

Empty output is now allowed and does not crash the workflow.

New features

The workflow is now tested for biallelic indels, too.

New memory and queue requirements make it easier to run on large cohorts with default settings.

Release v2.1

Bugfixes

Changed the default memory value for task create_regions_files, which was causing the workflow to crash on large cohorts.

Changed the declaration type of the memory value in the config file from Float to Int, to avoid issues with LSF flags on the HPC.

New features

New input options make it more clear how to run the workflow using a pre-computed GRM.

A new filter for differential missingness is added to the GRM creation step.

There is a new output file with counts of variants in each MAC category used.

Release v2.0.1

Bugfixes

During the VEP functional annotation filtering step, if the empty string is provided as the value for variable "vep_severity_to_include" then all variants are accepted - this is the same behaviour that " bcftools +split-vep -s worst: " has.

Only autosomes are used to create the GRM for SAIGE-GENE, because some of the chrX files occasionally gave errors similar to reported bugs in indexing of sex chromosomes.

New features

You can now use chrX with both the "aggV2" and the "aggV2_PASS_UTRplus_proteincodinggenes" input variant datasets.

Release v2.0

Bugfixes

In case of gene-based input, i.e. "chromosome" file or "gene" file, during the VEP functional annotation filtering step, for each gene all variants are now included or excluded according to the "worst" Consequence on any transcript for that gene. In case or coordinate-based inputs, the "worst" Consequence at each location will be selected, as in previous behaviour. In case of "groups" input, the VEP functional annotation filtering step is skipped.

Input genes, groups, or coordinate blocks that are split across more than one "chunk" of the input variant dataset are now processed as a whole, after the "chunks" are resized appropriately. Therefore, output results do not have a "__chunkXXXX" specification appended to each gene/group/coordinate-block name any more.

New features

You can now also specify inputs as SAIGE-GENE-like "groups" of variants.

Differential missingness filters have been introduced.

The GRM used by SAIGE-GENE can now be created very quickly by specifying the relevant plink files as an input.

You can choose to use the full "aggV2", or the much smaller "aggV2_PASS_UTRplus_proteincodinggenes", as an input variant dataset.

Release v1.0

First release - working by coordinate, i.e. following the boundaries of the input variant dataset's chunks strictly.