Skip to content
Genomics England Research Environment User Guide
Release V6 (28/02/2019)
Initializing search
Home
Getting started
How-to guides
Data
Desktop applications
CloudOS
High Performance Cluster (HPC)
Workflows, scripts and containers
Getting help
Data security and Export
Training
Genomics England Research Environment User Guide
Home
Getting started
Getting started
Welcome pack
Accessing the RE
Accessing the RE
Setting up AWS
Accessing the RE FAQs
Change or reset your password
Learning to use the RE
Your role as a researcher
Your role as a researcher
Introduction to the Research Network
Student guidelines
Files, directories and import/export
Files, directories and import/export
Filesystem
Importing data and tools into the Research Environment
Exporting your results and publishing
Technical information
Technical information
Whitelisted sites
Backup retention policy
Desktop RAM limits
Support for Python in the Research Environment
How-to guides
How-to guides
Complete end-to-end guides
Complete end-to-end guides
I'm interested in a phenotype and I want to know what variants are related
I'm interested in a gene and I want to know what phenotypes are related
I want to know more about pathogenicity of different variant types on a large scale
I want to find a diagnosis for patients who didn't get one through primary clinical interpretation
I want to develop and test scripts and workflows
Working with the desktop applications (videos)
Working with the desktop applications (videos)
About the RE videos
Labkey videos
Airlock videos
Participant explorer videos
IVA videos
Data exploration
Data exploration
Working with LabKey
Working with LabKey
Labkey API
Labkey API configuration
Working with NHS GMS data within LabKey
Labkey videos
Getting medical histories
Getting medical histories
Accessing and comparing medical history data with Participant Explorer
Accessing medical history data programmatically
Accessing mental health data programmatically
Cohort building
Cohort building
Phenotype-first cohort building
Phenotype-first cohort building
Building cohorts with Participant Explorer
Building cohorts with Cohort Browser in CloudOS
Building cancer cohorts programmatically
Building rare disease cohorts programmatically
Genotype-first cohort building
Genotype-first cohort building
Using our variant screening workflows
Finding participants by genotype in IVA
Finding participants by genotype in Cohort browser in CloudOS
Finding participants with prioritised variants programmatically
Querying aggregate VCF files to find participants by genotypes
Working with our aggregate VCF datasets
Working with our aggregate VCF datasets
AggV2 code book
AggV2 code book
AggV2 code book general information
AggV2 code book genotype queries
AggV2 code book functional annotation queries
AggV2 code book phenotype queries
AggV2 code book combining queries
somAgg code book
somAgg code book
somAgg code book general information
somAgg code book genotype queries
somAgg code book functional annotation queries
somAgg code book phenotype queries
De novo data code book
Data
Data
Data overview
Data overview
Data for cancer participants
Data for rare disease participants
Data for all participants
Data for COVID-19 participants
100kGP disease models
Clinical and phenotype data
Clinical and phenotype data
100kGP clinical and phenotype data
100kGP clinical and phenotype data
Cancer-specific 100kGP clinical data
Rare disease-specific 100kGP clinical data
General 100kGP clinical data
NHS Genomic Medicine Service (GMS) Clinical and Phenotype data
NHS Genomic Medicine Service (GMS) Clinical and Phenotype data
Cancer-specific NHS GMS clinical data
Rare disease-specific NHS GMS clinical data
General NHS GMS clinical data
COVID-19 clinical data
Genomic data
Genomic data sources
Transcriptomics pilot data
Genomics England bioinformatics data
Genomics England bioinformatics data
Aggregated variant calls(AggV2)
Aggregated variant calls(AggV2)
AggV2 details
AggV2 details
AggV2 sample QC
AggV2 gVCF aggregation
AggV2 variant normalisation and representation
AggV2 site QC, FILTER and INFO Fields
AggV2 functional annotation
Summary statistics across genetically-inferred ancestry groups for 100,000 Genomes Project participants
AggV2 allele frequencies
AggV2 Principal components and genetically inferred relatedness
AggV2 Ancestry inference
AggV2 file manifest
AggV2 FAQs
Somatic aggregated variant calls
Somatic aggregated variant calls
somAgg sample stats
somAgg variant counts per chunk
somAgg VCF aggregation
COVID-19 aggregations
COVID-19 aggregations
COVID-19 aggregation methods
Data available for the COVID-19 aggregation
COVID-19 publication
Genetic similarity to worldwide populations (ancestry) in the UK Biobank
De novo variant research dataset
De novo variant research dataset
Where and how to access de novo data
De novo data cohort statistics
De novo data FAQs
Tiering
Tiering
Rare disease tiering
Cancer tiering
Interpretation request (rare disease)
Solved cases (rare disease)
HLA variants
Exomiser
Cancer analysis
Cancer analysis
Cancer analysis
Cancer analysis histology and TGCA study
Staging data (cancer)
Staging data (cancer)
Cancer staging statistics 19
Cancer staging statistics 18
Cancer staging statistics 17
Cancer staging statistics 16
Cancer staging statistics 15
Cancer staging statistics 14
Cancer staging statistics 13
Cancer staging statistics 12
Cancer staging statistics 11
Cancer staging statistics 10
Cancer staging statistics 9
Cancer staging statistics 8
Long-read sequencing data
Long-read sequencing data
Long-read sequencing pilot project
Long-read genomic data from PacBio (Pilot)
COVID-19 long read sequencing
Orthogonal standard-of-care (SOC) test data (cancer)
100,000 Genomes Cancer Programme - pan-cancer publication
Clinical application of tumour in normal contamination assessment from WGS - TINC publication
Publicly available data
Research community provided data
Research community provided data
Participant supplementary data
Aggv2 phased data (provided by University of Oxford)
Polygenic risk scores (provided by Genomics PLC)
Data releases
Data releases
Genomic Medicine Service (GMS) data releases
Genomic Medicine Service (GMS) data releases
Change summary
Release 4 (22/08/2024)
Release 3 (14/03/2024)
Release 2 (28/02/2023)
Release 1 (15/06/2022)
100kGP (main programme) releases
100kGP (main programme) releases
Change summary
Release 19 (31/10/2024)
Release 18 (21/12/2023)
Release 17 (30/03/2023)
Release 16 (13/10/2022)
Release 15 (26/05/2022)
Release 14 (27/01/2022)
Release 13 (30/09/2021)
Release 12 (06/05/2021)
Release 11 (17/12/2020)
Release 10 (03/09/2020)
Release 9 (02/02/2020)
Release 8 (28/11/2019)
Release 7 (25/07/2019)
Release 6 (28/02/2019)
Release 5.1 (20/11/2018)
Release 5 (31/10/2018)
Release 4 (31/07/2018)
Release 3 (20/04/2018)
Release 2 (30/01/2018)
Release 1 (11/10/2017)
Covid-19 data releases
Covid-19 data releases
COVID-19 data release v6.0
COVID-19 data release v5.0
Frequent data releases
Application data versions
Terminology server
Desktop applications
Desktop applications
LabKey - tables of data
LabKey - tables of data
RStudio
Participant Explorer - search for participants
Participant Explorer - search for participants
Search for participants
Browse search results
View participant
Compare participants' medical histories
Download search results
Data in Participant Explorer
Participant Explorer code systems
Participant Explorer release notes
Participant Explorer FAQs
Interactive Variant Analysis (IVA) - catalogue of variants
Interactive Variant Analysis (IVA) - catalogue of variants
IVA variant browser
IVA case interpretation, case portal
IVA catalog
IVA filter reference
IVA project and studies
Integrative Genome Viewer (IGV) - visualise genomic data
PanelApp - curated gene lists
BioMart
Terminal application
Jupyter notebooks
VSCode
LibreOffice
CloudOS
CloudOS
CloudOS data
High Performance Cluster (HPC)
High Performance Cluster (HPC)
What is an HPC?
Accessing the HPC
How to run jobs on the HPC
How to run jobs on the HPC
Queues on the HPC
Memory allocation on the HPC
LSF project codes
HPC submission scripts
Job dependencies
Application profile
Monitoring jobs on the HPC
HPC troubleshooting
Using software on the HPC
How to request software installation within the Research Environment
Using R on the HPC
Using R on the HPC
Using Rstudio on the HPC
Plotting in R on the HPC
Working with R packages
Python packages and personal conda environments
Importing software with containers
Importing software with containers
Step-by-step guide to using containers
Troubleshooting with containers
Jupyter Lab on the HPC
Workflows, scripts and containers
Workflows, scripts and containers
Workflows
Workflows
Association testing
Association testing
Aggregate variant testing (AVT)
Aggregate variant testing (AVT)
Input files
Parameters
Output files
Known issues and limitations
Changelog
Previous WDL versions
Previous WDL versions
v3
v3
AVT detailed input file overview
GWAS
GWAS
Version history
Optional arguments and default
Detailed process output
Software versions
Common errors
Variant screening
Variant screening
Small Variant
Small Variant
Parameters
Input files
Output files
Known issues and limitations
Changelog
Previous WDL version
Previous WDL version
v1
v1
Troubleshooting the Gene Variant workflow
Structural Variant
Structural Variant
Structural variant workflow changelog
Structural variant workflow appendix
Previous versions
Previous versions
v2.0.1
v2.0.1
v1.2
Scripts
Scripts
Extract variants by coordinate
Cancer survival analysis
Somatic SVs and CNVs for a specific gene
Variant Effect Predictor (VEP)
Support
Getting help
Getting help
Live service issues
Prioritise our roadmap for workflows and scripts
Further reading and documentation
Frequently asked questions
Frequently asked questions
Access and login
Access and login
Can I register my colleague for data access?
Someone I know is a member of Research Network, but they haven't received login details yet - why?
Clinical Research interface
Clinical Research interface
I want to contact one or more clinicians to get more information about a set of participants that I want to collaborate to publish a paper on
A clinical colleague has a patient they would like me to look at more closely to see if I can find any variants that were un-tiered or tier 3 that could have caused their disorder
I reported a diagnosis using the Researcher Identified Potential Diagnosis form, but I have heard nothing. Has it gone back to the doctor? Did they think it was the answer?
I want to look at all the participants recruited from my local Genomic Medicine Service (GMS) and see if I can identify new diagnoses
I have found some variants that I think are probably causal but I need DNA/RNA to confirm that they are functional
I want to report a variant that was found to be causal through the NGRL Genomes Project through ClinVar, can I?
I want to share a gene using GeneMatcher to see if there are other cases around the world
I am working in a consortium with others treating the same disorder, can I discuss variants and phenotypes with other clinicians to identify diagnoses?
I have noticed missing clinical data that would help a lot with research, can I collect and import it into the Research Environment?
I am both a clinician and a researcher, can I publish information about my own patients using data from the NGRL Genomes Project?
Data
Data
How do I identify compound heterozygous mutations within the Genomics England Dataset?
How often are the data updated?
What is Rare Disease Tiering data?
Where can I find information on de novo variants?
Why can't I see all the genomes in the 'genomes/by_date' folder?
Why do some participants have multiple genomic data on the same reference assembly?
How do I know if a pathogenic variant with a high allele frequency is true or a variant calling error?
Tools
Tools
Can I bring in docker/container?
Can I install software in the Research Environment?
Is it possible to annotate variants within the Research Environment?
Is there an R package we can use to interrogate data from LabKey?
Problem loading IGV in the Research Environment
Vcf2maf on the HPC
LabKey DOCTYPE error
Loading R packages when versions are not synchronised
Research
Research
Creating "lollipops" diagrams in the TRE
Reference genomes within the Research Environment
If I think I have found a variant that I believe is pathogenic, who should I tell?
Data security and Export
Data security and Export
Data security and you
Airlock - importing and exporting files
Airlock - importing and exporting files
Using the Airlock
What you can and can't export
How we process Airlock applications
Case studies
Reporting potential diagnoses and contacting clinicians
Training
Training
Upcoming live training
Using the HPC and Cloud to run jobs, November 2024
What tools and workflows should I use to fulfil an overall goal?, October 2024
Using GEL data for publications and reports, September 2024
Getting medical records for participants, July 2024
An introduction to the Research Environment, live training session at GERS
Finding participants based on genotypes, June 2024
Building rare disease cohorts with matching controls, May 2024
Introduction to the RE, April 2024
Building cancer cohorts and survival analysis, March 2024
Importing tools and data to use in the Research Environment, February 2024
Using the Research Environment for clinical diagnostic discovery, January 2024
Using the HPC to run jobs, December 2023
What tools and workflows should I use to fulfil an overall goal?, November 2023
Using GEL data for publications and reports, October 2023
Getting medical records for participants, August 2023
Finding participants based on genotypes, July 2023
Building rare disease cohorts with matching controls
Building cancer cohorts and survival analysis
New datasets in the RE, May 2023
Importing tools and data to use in the Research Environment, March 2023
Using the GEL Research Environment for clinical genetic diagnosis, February 2023
Introduction to the Research Environment, January 2023
Using the HPC to run jobs, November 2022
Getting medical histories for participants, September 2022
Finding participants based on genotypes, July 2022
Building a cohort based on phenotypes, May 2022
Introduction to the Research Environment, March 2022
Bioinformatics Clinics
data
reference
100kGP
100kGP
Release V6 (28/02/2019)
ΒΆ
Release notes
Data dictionary
Back to top