Skip to content

File Manifest

The root file path for all aggV2 data is:

/gel_data_resources/main_programme/aggregation/aggregate_gVCF_strelka/aggV2/

Add this to the extended file paths in the table to generate the full file path.

Files and descriptions Extended file path
Aggregated genomic data genomic_data/gel_mainProgramme_aggV2_.vcf.gz
Aggregated functional annotation data using VEP 98 functional_annotation/VEP/gel_mainProgramme_aggV2__VEPannot.vcf.gz
Aggregated functional annotation data using VEP 99 functional_annotation/VEP_99/gel_mainProgramme_aggV2__VEPannot.vcf.gz
Test data. 5 chunks of 1000 variants each by 78,195 samples. Useful for testing scripts and workflows. Index files also present. additional_data/test_data/gel_mainProgramme_aggV2_.vcf.gz
Sample QC statistics. A tab-delimited version of the aggregate_gvcf_sample_stats table in LabKey.  additional_data/aggregate_gvcf_sample_stats/aggregate_gvcf_sample_stats_v10_78195.tsv
Chunk names. A seven column tab-delimited file of chunk names in aggV2 with full file paths to genotype and functional annotation VCFs. 0-indexed BED format.  additional_data/chunk_names/aggV2_chunk_names.bed
Chunk names. A seven column tab-delimited file of chunk names in aggV2 with full file paths to genotype and functional annotation VCFs.  additional_data/chunk_names/aggV2_chunk_names.tsv
aggV2 sample list. All sample IDs in aggV2.  additional_data/sample_list/aggV2_sampleIds_mpv10_78195.tsv
XX female participant list additional_data/sample_sex/xx_females_illumina_ploidy_samples_40653.tsv
XY male participant list additional_data/sample_sex/xy_males_illumina_ploidy_samples_35822.tsv
High LD exclusion regions additional_data/PCs_relatedness/ MichiganLD_liftover_exclude_regions.txt
High confidence independent (MAF > 0.05) SNP binary files additional_data/HQ_SNPs/GELautosomes_LD_pruned_1kgp3Intersect_maf0.05_mpv10.*
High confidence independent (MAF > 0.01) SNP binary files additional_data/HQ_SNPs/MAF1/GELautosomes_LD_pruned_1kgp3Intersect_maf0.01_mpv10.*
PCs1-50 across all aggV2 participants additional_data/PCs_relatedness/PCA/ GEL_aggV2_MAF5_mp10.eigenvec
Eigenvalues for unrelated aggV2 participants additional_data/PCs_relatedness/PCA/GEL_aggV2_MAF5_mp10.eigenval
Proportion of variance explained for PCs on unrelated aggv2 participants additional_data/PCs_relatedness/PCA/GEL_aggV2_MAF5_mp10.propvar
Pairwise kinship estimates for related individuals (threshold > 0.0442) additional_data/PCs_relatedness/relatedness/GEL_aggV2_MAF5_mp10_0.0442.kin0
Kinship matrix for all individuals in aggV2 (stored in triangle, binary format) additional_data/PCs_relatedness/relatedness/GEL_aggV2_MAF5_mp10.king.bin
List of related sample platekeys (threshold > 0.0442) additional_data/PCs_relatedness/relatedness/GEL_aggV2_MAF5_mp10.king.cutoff.related.id
List of unrelated sample platekeys (threshold < 0.0442) additional_data/PCs_relatedness/relatedness/GEL_aggV2_MAF5_mp10.king.cutoff.unrelated.id
All platekeys assessed for relatedness additional_data/PCs_relatedness/relatedness/GEL_aggV2_MAF5_mp10.king.id
Eigenvalues from PCA on 1KGP3 unrelated individuals using aggV2 HQ SNPs additional_data/ancestry/1KGP3_PCs/1KGP3_MAF5.eigenval
Eigenvectors from PCA on 1KGP3 unrelated individuals 1KGP3 using aggV2 HQ SNPs additional_data/ancestry/1KGP3_PCs/1KGP3_MAF5.eigenvec
PC loadings from PCA on 1KGP3 unrelated individuals 1KGP3 using aggV2 HQ SNPs additional_data/ancestry/1KGP3_PCs/1KGP3_MAF5.pcl
Eigenvectors for projection of aggV2 samples into the unrelated individuals 1KGP3 PC loadings using aggV2 HQ SNPs additional_data/ancestry/1KGP3_projection_GEL/GEL_aggV2_proj_on_1KGP3_MAF5_mp10.eigenvec
Genetically inferred ancestry probabilities based on super-populations from the 1KGP3 additional_data/ancestry/MAF5_superPop_predicted_ancestries.tsv
Genetically inferred ancestry probabilities based on sub-populations from the 1KGP3 additional_data/ancestry/MAF1/MAF1_actg_filtered_subPop_predicted_ancestries.tsv
The VEP severity scale used in the bcftools +split-vep plugin.  additional_data/VEP_severity_scale_2020.txt