AggV2 functional annotation¶
VEP-annotated aggV2 files are available, showing the variant consequences of all variants called in the aggV2 dataset.
The full functional annotation data can be found at:
/gel_data_resources/main_programme/aggregation/aggregate_gVCF_strelka/aggV2/functional_annotation
.
The functional annotation data is split into 1371 chunks.
The output format of the functional annotation files is compressed VCFs (vcf.gz
). The CHROM
, POS
, REF
, ALT
, FILTER
and INFO
fields from the genomic data are preserved for the functional annotation, but genotypes are dropped.
VEP versions¶
There are several versions of the AggV2 functional annotation files. These represent different versions of VEP and corresponding Ensembl annotation, not changes in the aggregation itself.
The latest version available is VEP 109.
Plugins and annotation sources¶
The following VEP plugins were used in annotation:
- GREEN-VARAN
- CADD v1.6
- REVEL
- UTRannotator
- Clinpred
- NMD
- SpliceAI
- SpliceRegion
- MTR
- dbNSFPP
- LOFTEE
- Clinvar
- MitoTIP
- gnomAD - included the following fields (with gnomADg_ prefix): AF, AF_afr, AF_mid, AF_amr, AF_asj, AF_eas, AF_sas, AF_fin, AF_nfe, AF_oth, AF_ami, AF_XY, AF_XX, faf95_sas, faf99_sas, faf95_eas, faf99_eas, faf95_amr, faf99_amr, faf95_afr, faf99_afr, faf95, faf99, faf95_nfe, faf99_nfe
- Genomics England allele frequencies - With the following fields: AC_whole_cohort, AN_whole_cohort, AF_whole_cohort, AC_cancer_all, AN_cancer_all, AF_cancer_all, AC_rd_all, AN_rd_all, AF_rd_all, AC_rd_probands, AN_rd_probands, AF_rd_probands, AC_afr, AN_afr, AF_afr, AC_eas, AN_eas, AF_eas, AC_eur, AN_eur, AF_eur, AC_sas, AN_sas, AF_sas, AC_unrelated_cohort, AN_unrelated_cohort, AF_unrelated_cohort, AC_cancer_unrel, AN_cancer_unrel, AF_cancer_unrel
The following plugins have been purposely excluded for technical, governance or data reasons:
- MMSplice - This plugin would reliably crash on some problem variants.
- SVoverlaps - Due to the lack of a aggregated set of GRCh38 structural variants.
- Funmotifs - Due to a lack of GRCh38 resources.
- M-CAP - The plugin only works for GRCh37.
- LD - Due to speed concerns. We recommend calculating LD outside of VEP.
- NearestGene - This is unable to be run in offline mode.
Extracting VEP annotation information¶
Annotated information for variants are written to the INFO/CSQ field, with a '|' field separator. To extract this information programmatically, we recommend using bcftools +split-vep as follows:
For more information on VEP, please see the Ensembl VEP documentation page and the split-vep documentation.
Help and support¶
Please reach out via the Genomics England Service Desk for any issues related to the aggV2 aggregation or companion datasets, including "aggV2" in the title/description of your inquiry.