Skip to content

AggV2 functional annotation

VEP-annotated aggV2 files are available, showing the variant consequences of all variants called in the aggV2 dataset.

The full functional annotation data can be found at: /gel_data_resources/main_programme/aggregation/aggregate_gVCF_strelka/aggV2/functional_annotation.

The functional annotation data is split into 1371 chunks.

The output format of the functional annotation files is compressed VCFs (vcf.gz). The CHROM, POS, REF, ALT, FILTER and INFO fields from the genomic data are preserved for the functional annotation, but genotypes are dropped.

VEP versions

There are several versions of the AggV2 functional annotation files. These represent different versions of VEP and corresponding Ensembl annotation, not changes in the aggregation itself.

The latest version available is VEP 109.

Plugins and annotation sources

The following VEP plugins were used in annotation:

  • GREEN-VARAN
  • CADD v1.6
  • REVEL
  • UTRannotator
  • Clinpred
  • NMD
  • SpliceAI
  • SpliceRegion
  • MTR
  • dbNSFPP
  • LOFTEE
  • Clinvar
  • MitoTIP
  • gnomAD - included the following fields (with gnomADg_ prefix): AF, AF_afr, AF_mid, AF_amr, AF_asj, AF_eas, AF_sas, AF_fin, AF_nfe, AF_oth, AF_ami, AF_XY, AF_XX, faf95_sas, faf99_sas, faf95_eas, faf99_eas, faf95_amr, faf99_amr, faf95_afr, faf99_afr, faf95, faf99, faf95_nfe, faf99_nfe
  • Genomics England allele frequencies - With the following fields: AC_whole_cohort, AN_whole_cohort, AF_whole_cohort, AC_cancer_all, AN_cancer_all, AF_cancer_all, AC_rd_all, AN_rd_all, AF_rd_all, AC_rd_probands, AN_rd_probands, AF_rd_probands, AC_afr, AN_afr, AF_afr, AC_eas, AN_eas, AF_eas, AC_eur, AN_eur, AF_eur, AC_sas, AN_sas, AF_sas, AC_unrelated_cohort, AN_unrelated_cohort, AF_unrelated_cohort, AC_cancer_unrel, AN_cancer_unrel, AF_cancer_unrel

The following plugins have been purposely excluded for technical, governance or data reasons:

  • MMSplice - This plugin would reliably crash on some problem variants.
  • SVoverlaps - Due to the lack of a aggregated set of GRCh38 structural variants.
  • Funmotifs - Due to a lack of GRCh38 resources.
  • M-CAP - The plugin only works for GRCh37.
  • LD - Due to speed concerns. We recommend calculating LD outside of VEP.
  • NearestGene - This is unable to be run in offline mode.

Extracting VEP annotation information

Annotated information for variants are written to the INFO/CSQ field, with a '|' field separator. To extract this information programmatically, we recommend using bcftools +split-vep as follows:

#List available VEP annotation fields to be queried
bcftools +split-vep test/split-vep.vcf -l | head

0   Allele
1   Consequence
2   IMPACT
3   SYMBOL
4   Gene
5   Feature_type
6   Feature
7   BIOTYPE
8   EXON
9   INTRON

#Example to extract CHROM, POS, and Consequence information
bcftools +split-vep test/split-vep.vcf -f '%CHROM:%POS %Consequence\n'

For more information on VEP, please see the Ensembl VEP documentation page and the split-vep documentation.

Help and support

Please reach out via the Genomics England Service Desk for any issues related to the aggV2 aggregation or companion datasets, including "aggV2" in the title/description of your inquiry.