AggV3 functional annotation¶
The Functional Annotation dataset provides functional consequence information for all variants included in aggV3. The dataset was generated by running VEP on the aggV3 sites which are split into sub-shard VCFs. The corresponding functional annotation VCFs follow the same sharding structure.
Functional annotation was performed using VEP v115, with some VEP plugins enabled in addition to other annotations described below.
Data format and availability¶
The functional annotation data consists of 3166 sub-shard VCFs. Their corresponding s3 paths are provided in a BED file format at s3://512426816668-gel-data-resources/dragen3.7.8/AggV3_resources/manifests/functional_annotation/2025-12-24/functional_annotation_shards.bed.
The output format of the functional annotation files is compressed VCFs (vcf.gz), where CHROM, POS, REF, ALT, and FILTER fields are preserved from the site-level aggV3 VCFs.
Plugins and annotation sources¶
The following VEP plugins were used for annotation:
| Plugin | Version | File path |
|---|---|---|
| CADD | 1.6 | SNVs - s3://512426816668-public-data-resources/CADD/v1.6/GRCh38_v1.6/no_anno/whole_genome_SNVs.tsv.gz |
INDELs - s3://512426816668-public-data-resources/CADD/v1.6/GRCh38_v1.6/no_anno/gnomad.genomes.r3.0.indel.tsv.gz |
||
| REVEL | 1.3 | s3://512426816668-public-data-resources/vep_resources/REVEL/revel_v1.3_GRCh38.tsv.gz |
| UTRannotator | GRCh38 | s3://512426816668-public-data-resources/utrannotator/uORF_5UTR_GRCh38_PUBLIC.txt |
| NMD | NA | NA |
| SpliceAI | 1.3 | SNVs - s3://512426816668-public-data-resources/SpliceAI/Predicting_splicing_from_primary_sequence-66029966/genome_scores_v1.3/spliceai_scores.raw.snv.hg38.vcf.gz |
INDELs - s3://512426816668-public-data-resources/SpliceAI/Predicting_splicing_from_primary_sequence-66029966/genome_scores_v1.3/spliceai_scores.raw.indel.hg38.vcf.gz |
||
| SpliceRegion | NA | NA |
| LOFTEE | GRCh38 | Human ancestor fasta - s3://512426816668-public-data-resources/vep_resources/LOFTEE/Build-38/human_ancestor.fa.gz |
Conservation file - s3://512426816668-public-data-resources/vep_resources/LOFTEE/Build-38/gerp_conservation_scores.homo_sapiens.GRCh38.bw |
||
GERP BigWig - s3://512426816668-public-data-resources/vep_resources/LOFTEE/Build-38/loftee.sql |
||
| AlphaMissense | hg38 | s3://512426816668-public-data-resources/AlphaMissense/AlphaMissense_hg38.tsv.gz |
| MechPredict | NA | s3://512426816668-public-data-resources/MechPredict/MechPredict_input.tsv |
| gnomAD | 4.1 | s3://512426816668-public-data-resources/gnomad/v4.1/gnomad_4.1_subset_allchr.vcf.gz |
Note on gnomAD annotations
Annotations from gnomAD v4.1 are available in the functional annotation VCFs, appearing with the prefix gnomADg_.
GnomAD v4.1 annotations includes allele frequencies derived from genome, exome, and joint call-sets. Additionally, we include statistical test results (stat_union_p_value, stat_union_test_name, and stat_union_gen_ancs) provided by gnomAD to flag variants with significantly different allele frequencies between exome and genome datasets. Read more about gnomAD v4.1 in their announcement article.
The following INFO fields from gnomAD are included in the functional annotation VCFs:
AF_joint, AF_exomes, AF_genomes, AF_joint_XX, AF_joint_XY, AF_joint_afr_XX, AF_joint_afr_XY, AF_joint_afr, AF_joint_ami_XX, AF_joint_ami_XY, AF_joint_ami, AF_joint_amr_XX, AF_joint_amr_XY, AF_joint_amr, AF_joint_asj_XX, AF_joint_asj_XY, AF_joint_asj, AF_joint_eas_XX, AF_joint_eas_XY, AF_joint_eas, AF_joint_fin_XX, AF_joint_fin_XY, AF_joint_fin, AF_joint_mid_XX, AF_joint_mid_XY, AF_joint_mid, AF_joint_nfe_XX, AF_joint_nfe_XY, AF_joint_nfe, AF_joint_remaining_XX, AF_joint_remaining_XY, AF_joint_remaining, AF_joint_sas_XX, AF_joint_sas_XY, AF_joint_sas, faf95_joint, faf99_joint, faf95_joint_afr, faf99_joint_afr, faf95_joint_amr, faf99_joint_amr, faf95_joint_eas, faf99_joint_eas, faf95_joint_mid, faf99_joint_mid, faf95_joint_nfe, faf99_joint_nfe, faf95_joint_sas, faf99_joint_sas, stat_union_p_value, stat_union_test_name, stat_union_gen_ancs, AF_exomes_XX, AF_exomes_XY, AF_exomes_afr_XX, AF_exomes_afr_XY, AF_exomes_afr, AF_exomes_amr_XX, AF_exomes_amr_XY, AF_exomes_amr, AF_exomes_asj_XX, AF_exomes_asj_XY, AF_exomes_asj, AF_exomes_eas_XX, AF_exomes_eas_XY, AF_exomes_eas, AF_exomes_fin_XX, AF_exomes_fin_XY, AF_exomes_fin, AF_exomes_mid_XX, AF_exomes_mid_XY, AF_exomes_mid, AF_exomes_nfe_XX, AF_exomes_nfe_XY, AF_exomes_nfe, AF_exomes_remaining_XX, AF_exomes_remaining_XY, AF_exomes_remaining, AF_exomes_sas_XX, AF_exomes_sas_XY, AF_exomes_sas, faf95_exomes, faf99_exomes, faf95_exomes_afr, faf99_exomes_afr, faf95_exomes_amr, faf99_exomes_amr, faf95_exomes_eas, faf99_exomes_eas, faf95_exomes_mid, faf99_exomes_mid, faf95_exomes_nfe, faf99_exomes_nfe, faf95_exomes_sas, faf99_exomes_sas, AF_genomes_XX, AF_genomes_XY, AF_genomes_afr_XX, AF_genomes_afr_XY, AF_genomes_afr, AF_genomes_ami_XX, AF_genomes_ami_XY, AF_genomes_ami, AF_genomes_amr_XX, AF_genomes_amr_XY, AF_genomes_amr, AF_genomes_asj_XX, AF_genomes_asj_XY, AF_genomes_asj, AF_genomes_eas_XX, AF_genomes_eas_XY, AF_genomes_eas, AF_genomes_fin_XX, AF_genomes_fin_XY, AF_genomes_fin, AF_genomes_mid_XX, AF_genomes_mid_XY, AF_genomes_mid, AF_genomes_nfe_XX, AF_genomes_nfe_XY, AF_genomes_nfe, AF_genomes_remaining_XX, AF_genomes_remaining_XY, AF_genomes_remaining, AF_genomes_sas_XX, AF_genomes_sas_XY, AF_genomes_sas, faf95_genomes, faf99_genomes, faf95_genomes_afr, faf99_genomes_afr, faf95_genomes_amr, faf99_genomes_amr, faf95_genomes_eas, faf99_genomes_eas, faf95_genomes_nfe, faf99_genomes_nfe, faf95_genomes_sas, faf99_genomes_sas
Additional annotations include:
| Resource | Version | File path |
|---|---|---|
| GREEN-VARAN | 1.3.2 | s3://512426816668-public-data-resources/greenvaran/GRCh38_GREEN-DB.bed.gz |
| ClinVar | GRCh38 - 20250923 | s3://512426816668-public-data-resources/clinvar/20250923/clinvar_20250923.vcf.gz |
| PhyloP | GRCh38 - PhyloP100way | s3://512426816668-public-data-resources/phylop100way/hg38.phyloP100way.bw |