Skip to content

AggV3 functional annotation

The Functional Annotation dataset provides functional consequence information for all variants included in aggV3. The dataset was generated by running VEP on the aggV3 sites which are split into sub-shard VCFs. The corresponding functional annotation VCFs follow the same sharding structure.

Functional annotation was performed using VEP v115, with some VEP plugins enabled in addition to other annotations described below.

Data format and availability

The functional annotation data consists of 3166 sub-shard VCFs. Their corresponding s3 paths are provided in a BED file format at s3://512426816668-gel-data-resources/dragen3.7.8/AggV3_resources/manifests/functional_annotation/2025-12-24/functional_annotation_shards.bed.

The output format of the functional annotation files is compressed VCFs (vcf.gz), where CHROM, POS, REF, ALT, and FILTER fields are preserved from the site-level aggV3 VCFs.

Plugins and annotation sources

The following VEP plugins were used for annotation:

Plugin Version File path
CADD 1.6 SNVs - s3://512426816668-public-data-resources/CADD/v1.6/GRCh38_v1.6/no_anno/whole_genome_SNVs.tsv.gz
INDELs - s3://512426816668-public-data-resources/CADD/v1.6/GRCh38_v1.6/no_anno/gnomad.genomes.r3.0.indel.tsv.gz
REVEL 1.3 s3://512426816668-public-data-resources/vep_resources/REVEL/revel_v1.3_GRCh38.tsv.gz
UTRannotator GRCh38 s3://512426816668-public-data-resources/utrannotator/uORF_5UTR_GRCh38_PUBLIC.txt
NMD NA NA
SpliceAI 1.3 SNVs - s3://512426816668-public-data-resources/SpliceAI/Predicting_splicing_from_primary_sequence-66029966/genome_scores_v1.3/spliceai_scores.raw.snv.hg38.vcf.gz
INDELs - s3://512426816668-public-data-resources/SpliceAI/Predicting_splicing_from_primary_sequence-66029966/genome_scores_v1.3/spliceai_scores.raw.indel.hg38.vcf.gz
SpliceRegion NA NA
LOFTEE GRCh38 Human ancestor fasta - s3://512426816668-public-data-resources/vep_resources/LOFTEE/Build-38/human_ancestor.fa.gz
Conservation file - s3://512426816668-public-data-resources/vep_resources/LOFTEE/Build-38/gerp_conservation_scores.homo_sapiens.GRCh38.bw
GERP BigWig - s3://512426816668-public-data-resources/vep_resources/LOFTEE/Build-38/loftee.sql
AlphaMissense hg38 s3://512426816668-public-data-resources/AlphaMissense/AlphaMissense_hg38.tsv.gz
MechPredict NA s3://512426816668-public-data-resources/MechPredict/MechPredict_input.tsv
gnomAD 4.1 s3://512426816668-public-data-resources/gnomad/v4.1/gnomad_4.1_subset_allchr.vcf.gz
Note on gnomAD annotations

Annotations from gnomAD v4.1 are available in the functional annotation VCFs, appearing with the prefix gnomADg_. GnomAD v4.1 annotations includes allele frequencies derived from genome, exome, and joint call-sets. Additionally, we include statistical test results (stat_union_p_value, stat_union_test_name, and stat_union_gen_ancs) provided by gnomAD to flag variants with significantly different allele frequencies between exome and genome datasets. Read more about gnomAD v4.1 in their announcement article.

The following INFO fields from gnomAD are included in the functional annotation VCFs:

AF_joint, AF_exomes, AF_genomes, AF_joint_XX, AF_joint_XY, AF_joint_afr_XX, AF_joint_afr_XY, AF_joint_afr, AF_joint_ami_XX, AF_joint_ami_XY, AF_joint_ami, AF_joint_amr_XX, AF_joint_amr_XY, AF_joint_amr, AF_joint_asj_XX, AF_joint_asj_XY, AF_joint_asj, AF_joint_eas_XX, AF_joint_eas_XY, AF_joint_eas, AF_joint_fin_XX, AF_joint_fin_XY, AF_joint_fin, AF_joint_mid_XX, AF_joint_mid_XY, AF_joint_mid, AF_joint_nfe_XX, AF_joint_nfe_XY, AF_joint_nfe, AF_joint_remaining_XX, AF_joint_remaining_XY, AF_joint_remaining, AF_joint_sas_XX, AF_joint_sas_XY, AF_joint_sas, faf95_joint, faf99_joint, faf95_joint_afr, faf99_joint_afr, faf95_joint_amr, faf99_joint_amr, faf95_joint_eas, faf99_joint_eas, faf95_joint_mid, faf99_joint_mid, faf95_joint_nfe, faf99_joint_nfe, faf95_joint_sas, faf99_joint_sas, stat_union_p_value, stat_union_test_name, stat_union_gen_ancs, AF_exomes_XX, AF_exomes_XY, AF_exomes_afr_XX, AF_exomes_afr_XY, AF_exomes_afr, AF_exomes_amr_XX, AF_exomes_amr_XY, AF_exomes_amr, AF_exomes_asj_XX, AF_exomes_asj_XY, AF_exomes_asj, AF_exomes_eas_XX, AF_exomes_eas_XY, AF_exomes_eas, AF_exomes_fin_XX, AF_exomes_fin_XY, AF_exomes_fin, AF_exomes_mid_XX, AF_exomes_mid_XY, AF_exomes_mid, AF_exomes_nfe_XX, AF_exomes_nfe_XY, AF_exomes_nfe, AF_exomes_remaining_XX, AF_exomes_remaining_XY, AF_exomes_remaining, AF_exomes_sas_XX, AF_exomes_sas_XY, AF_exomes_sas, faf95_exomes, faf99_exomes, faf95_exomes_afr, faf99_exomes_afr, faf95_exomes_amr, faf99_exomes_amr, faf95_exomes_eas, faf99_exomes_eas, faf95_exomes_mid, faf99_exomes_mid, faf95_exomes_nfe, faf99_exomes_nfe, faf95_exomes_sas, faf99_exomes_sas, AF_genomes_XX, AF_genomes_XY, AF_genomes_afr_XX, AF_genomes_afr_XY, AF_genomes_afr, AF_genomes_ami_XX, AF_genomes_ami_XY, AF_genomes_ami, AF_genomes_amr_XX, AF_genomes_amr_XY, AF_genomes_amr, AF_genomes_asj_XX, AF_genomes_asj_XY, AF_genomes_asj, AF_genomes_eas_XX, AF_genomes_eas_XY, AF_genomes_eas, AF_genomes_fin_XX, AF_genomes_fin_XY, AF_genomes_fin, AF_genomes_mid_XX, AF_genomes_mid_XY, AF_genomes_mid, AF_genomes_nfe_XX, AF_genomes_nfe_XY, AF_genomes_nfe, AF_genomes_remaining_XX, AF_genomes_remaining_XY, AF_genomes_remaining, AF_genomes_sas_XX, AF_genomes_sas_XY, AF_genomes_sas, faf95_genomes, faf99_genomes, faf95_genomes_afr, faf99_genomes_afr, faf95_genomes_amr, faf99_genomes_amr, faf95_genomes_eas, faf99_genomes_eas, faf95_genomes_nfe, faf99_genomes_nfe, faf95_genomes_sas, faf99_genomes_sas

Additional annotations include:

Resource Version File path
GREEN-VARAN 1.3.2 s3://512426816668-public-data-resources/greenvaran/GRCh38_GREEN-DB.bed.gz
ClinVar GRCh38 - 20250923 s3://512426816668-public-data-resources/clinvar/20250923/clinvar_20250923.vcf.gz
PhyloP GRCh38 - PhyloP100way s3://512426816668-public-data-resources/phylop100way/hg38.phyloP100way.bw