Where can I find information on de novo variants?¶
Where can I find information on de novo variants?
For the Main Programme V9 release, genome-wide de novo variant (DNV) annotation was performed for all trios that have been successfully run through the Genomics England Rare Disease Interpretation Pipeline.
The dataset comprises genome-wide DNV annotation for 13,949 trios from 12,609 families from the rare disease programme.
You can find the full documentation here: De novo variant research dataset
The DNV annotation pipeline flags likely DNVs for each trio based on an array of filters that interrogate the multi-sample VCF outputs of the Platypus variant caller. The filters are grouped into two broad categories, base and stringent, _and each variant is flagged if it fails any particular filter. We recommend using DNVs that pass the _stringent filter in general, as these are more likely to be true DNVs.
The outputs of the DNV research dataset are:
|denovo_cohort_information||A LabKey table with cohort information for all participants included in the DNV dataset. Attributes within this table include: participant ID, sex, affection status, family ID, pedigree ID, and the path to each family's multi-sample VCF with flagged DNVs.|
|denovo_flagged_variants||A LabKey table of all variants that pass base_filter for all trios within the DNV dataset. The table does not include variants that fail the base_filter due to size restrictions, but these can be found in the annotated multi-sample VCFs. This table includes all flags from the DNV annotation pipeline for each variant.|
|annotated multi-sample VCFs (family level)||All multi-sample VCFs per family with DNVs flagged within the FORMAT field. These VCFs are functionally annotated with VEP and accessible within the filesystem. File paths per participant are included in the denovo_cohort_information LabKey table. The data can be found in directory: /gel_data_resources/main_programme/denovo_variant_dataset/|
This page was last updated on the 02 Apr 2020.