Archive training session

Past training sessions may include information that is no longer true, in either the presentation or the Q&A. Please double check against the relevant documentation pages.

Using genomic data to build cohorts, June 2026¶

For many analyses, you may be starting with a (list of) gene(s) and you want to find all participants with variants in that/those gene(s). Or maybe you have variant loci and you want to get all participants with homo- or heterozygous alternative alleles at these loci.

In this training session, we will look at both no code tools for finding variants and command line tools on the high-performance cluster (HPC), including using GEL-provided workflows.

We will have a look at the Labkey tiering tables that provide all variants that are considered to be plausibly pathogenic, and learn how to filter these by genes or loci. We will use the Integrated Variant Analysis tool (IVA) to search for variants by genes or loci, plus other parameters such as proband and parental genotypes, consequences and population frequencies. For each of these variants, we can pull out the participants with these variants. The training will also cover how you can use APIs to fetch the same data programmatically.

We will also use the Small Variant workflow and Structural Variant workflow that allow us to identify all variants (short and structural, respectively) in a list of genes, pulling out the platekeys of participants with these variants. To find individuals with variants at particular loci, we will use bcftools with the aggregated VCF files on the HPC.

Timetable¶

13.30 Introduction and admin
13.35 LabKey tables of variant genotypes
13.45 Finding genotypes with IVA and Cohort Browser
14.00 The Small Variant and Structural Variant workflows
14.15 Aggregated variant files
14.30 Using bcftools to query aggregates
14.45 Getting help and questions

Learning objectives¶

After this training you will be able to:

Know which LabKey tables which contain tiered variant data
Use the IVA Variant Browser to filter variants.
Differentiate between the the Small Variant and SV/CNV workflows and know when to use them.
Understand the contents of the aggregated variant files: AggV3, AggV2 and SomAgg.
Run pipelines and tools on the GEL HPC.

Target audience¶

This training is aimed at researchers:

working with the Genomics England Research environment
working with genetic and genomic variation data
who can work on the command line to run tools and scripts

Date¶

9th June 2026

Materials¶

You can access the redacted slides and video below. All sensitive data has been censored. You can access and copy code from the Jupyter and R notebooks used in the training at:

/gel_data_resources/example_scripts/workshop_scripts/genotypes_2026

Slides¶

Download the slides

Video¶

Give us feedback on this tutorial

Q&A¶

Q&A

hi emily, if i have a list of genes . can i copy this into the environment? or would i have to get them from within th environment?

live answered