Skip to content
Genomics England Research Environment User Guide
Release V7 (25/07/2019)
Initializing search
Home
Getting started
Data
Desktop applications
Cloud-RE
High Performance Cluster (HPC)
Workflows, scripts and containers
Getting help
Data security and you
Tutorials
Genomics England Research Environment User Guide
Home
Getting started
Getting started
Welcome pack
Accessing the TRE
Accessing the TRE
Setting up AWS
Accessing the RE FAQs
Learning to use the TRE
What are you trying to do in the Research Environment?
What are you trying to do in the Research Environment?
I'm interested in a phenotype and I want to know what variants are related
I'm interested in a gene and I want to know what phenotypes are related
I want to know more about pathogenicity of different variant types on a large scale
I want to find a diagnosis for patients who didn't get one through primary clinical interpretation
I want to develop and test scripts and workflows
Your role as a researcher
Your role as a researcher
Introduction to the GECIP
Student guidelines
Files, directories and import/export
Files, directories and import/export
Home directory contents
Importing data and tools into the Research Environment
Exporting your results and publishing
Reporting potential diagnoses and contacting clinicians
Technical information
Technical information
Whitelisted sites
Backup retention policy
Desktop RAM limits
Support for Python in the Research Environment
Data
Data
Data overview
Data overview
Data for cancer participants
Data for rare disease participants
Data for all participants
Data for COVID-19 participants
100kGP disease models
Clinical and phenotype data
Clinical and phenotype data
100kGP clinical and phenotype data
100kGP clinical and phenotype data
Cancer-specific 100kGP clinical data
Rare disease-specific 100kGP clinical data
General 100kGP clinical data
NHS Genomic Medicine Service (GMS) Clinical and Phenotype data
NHS Genomic Medicine Service (GMS) Clinical and Phenotype data
Cancer-specific NHS GMS clinical data
Rare disease-specific NHS GMS clinical data
General NHS GMS clinical data
COVID-19 clinical data
Genomic data
Genomics England bioinformatics data
Genomics England bioinformatics data
Aggregated variant calls(AggV2)
Aggregated variant calls(AggV2)
AggV2 details
AggV2 details
AggV2 sample QC
AggV2 gVCF aggregation
AggV2 variant normalisation and representation
AggV2 site QC, FILTER and INFO Fields
AggV2 functional annotation
AggV2 allele frequencies
AggV2 Principal components and genetically inferred relatedness
AggV2 Ancestry inference
AggV2 file manifest
AggV2 code book
AggV2 code book
AggV2 code book general information
AggV2 code book genotype queries
AggV2 code book functional annotation queries
AggV2 code book phenotype queries
AggV2 code book combining queries
AggV2 FAQs
Somatic aggregated variant calls
Somatic aggregated variant calls
somAgg sample stats
somAgg variant counts per chunk
somAgg VCF aggregation
somAgg code book
somAgg code book
somAgg code book general information
somAgg code book genotype queries
somAgg code book functional annotation queries
somAgg code book phenotype queries
COVID-19 aggregations
COVID-19 aggregations
COVID-19 aggregation methods
Data available for the COVID-19 aggregation
COVID-19 publication
Genetic similarity to worldwide populations (ancestry) in the UK Biobank
De novo variant research dataset
De novo variant research dataset
Where and how to access de novo data
De novo data cohort statistics
De novo data code book
De novo data FAQs
Tiering
Tiering
Rare disease tiering
Cancer tiering
Interpretation request (rare disease)
Solved cases (rare disease)
HLA variants
Exomiser
Cancer analysis
Cancer analysis
Cancer analysis
Cancer analysis histology and TGCA study
Staging data (cancer)
Staging data (cancer)
Cancer staging statistics 17
Cancer staging statistics 16
Cancer staging statistics 15
Cancer staging statistics 14
Cancer staging statistics 13
Cancer staging statistics 12
Cancer staging statistics 11
Cancer staging statistics 10
Cancer staging statistics 9
Cancer staging statistics 8
Long-read sequencing data
Long-read sequencing data
Long-read genomic data from ONT
Long-read genomic data from PacBio (Pilot)
COVID-19 long read sequencing
Orthogonal standard-of-care (SOC) test data (cancer)
100,000 Genomes Cancer Programme - pan-cancer publication
Publicly available data
Research community provided data
Research community provided data
Participant supplementary data
Aggv2 phased data (provided by University of Oxford)
Polygenic risk scores (provided by Genomics PLC)
Data releases
Data releases
Genomic Medicine Service (GMS) data releases
Genomic Medicine Service (GMS) data releases
Change summary
Release 1 (15/06/2022)
Main programme releases
Main programme releases
Change summary
Release 16 (13/10/2022)
Release 15 (26/05/2022)
Release 14 (27/01/2022)
Release 13 (30/09/2021)
Release 12 (06/05/2021)
Release 11 (17/12/2020)
Release 10 (03/09/2020)
Release 9 (02/02/2020)
Release 8 (28/11/2019)
Release 7 (25/07/2019)
Release 6 (28/02/2019)
Release 5.1 (20/11/2018)
Release 5 (31/10/2018)
Release 4 (31/07/2018)
Release 3 (20/04/2018)
Release 2 (30/01/2018)
Release 1 (11/10/2017)
Covid-19 data releases
Covid-19 data releases
COVID-19 data release v5.0
Frequent data releases
Application data versions
Terminology server
Desktop applications
Desktop applications
LabKey - tables of data
LabKey - tables of data
Labkey API
Labkey API configuration
Working with NHS GMS data within LabKey
R in the Research Environment
R in the Research Environment
R, RStudio, and R libraries
Using Rstudio on the HPC
Using Rstudio on the HPC
Enable the terminal in Rstudio
Change keyboard shortcut
Logging into the HPC, and setting up your environment
Additional configuration and caveats
Self-service R package installation
Data Discovery - understand the cohort
Data Discovery - understand the cohort
Getting started with the Data discovery portal
Searching the Data discovery portal
Data discovery JSON structure
Data discovery graphs and charts
Data Discovery tips, tricks and FAQs
Participant Explorer - search for participants
Participant Explorer - search for participants
Search for participants
Browse search results
View participant
Compare participants' medical histories
Download search results
Data in Participant Explorer
Participant Explorer code systems
Participant Explorer release notes
Participant Explorer FAQs
Interactive Variant Analysis (IVA) - catalogue of variants
Interactive Variant Analysis (IVA) - catalogue of variants
IVA variant browser
IVA case interpretation, case portal
IVA catalog
IVA filter reference
IVA project and studies
Integrative Genome Viewer (IGV) - visualise genomic data
PanelApp - curated gene lists
Airlock - importing and exporting files
Airlock - importing and exporting files
Using the Airlock
What you can and can't export
How we process Airlock applications
Case studies
BioMart
Terminal application
Jupyter notebooks
VSCode
VSCode
Installing VSCode Extensions
Python development in VSCode
LibreOffice
Rocket chat - communicate with other RE users
Cloud-RE
Cloud-RE
CloudOS data
High Performance Cluster (HPC)
High Performance Cluster (HPC)
Accessing the HPC
Using software on the HPC
How to request software installation within the Research Environment
How to submit jobs to LSF
LSF project codes
HPC job submission guidelines
Libraries available in R
Python packages and personal conda environments
Using containers within the Research Environment
Jupyter Lab on Helix
Workflows, scripts and containers
Workflows, scripts and containers
Workflows
Workflows
Association testing
Association testing
Aggregate variant testing (AVT)
Aggregate variant testing (AVT)
AVT detailed input file overview
Previous versions of AVT
Previous versions of AVT
AVT v2
AVT v2
Aggregate Variant Testing "input variables" file
Aggregate Variant Testing example use cases
AVT v1
AVT v1
Aggregate Variant Testing Inputs File
GWAS
GWAS
Version history
Optional arguments and default
Detailed process output
Software versions
Common errors
Variant screening
Variant screening
Small Variant
Small Variant
Small variant workflow changelog
Small variant workflow appendix
Previous version
Previous version
v1
v1
Troubleshooting the Gene Variant workflow
Structural Variant
Structural Variant
Structural variant workflow changelog
Structural variant workflow appendix
Previous versions
Previous versions
v2.0.1
v2.0.1
v1.2
Functional Annotation
Scripts
Scripts
Extract variants by coordinate
Gene centric SNV report for cancer participants
Cancer survival analysis
Somatic SVs and CNVs for a specific gene
Variant Effect Predictor (VEP)
Support
Getting help
Getting help
Live service issues
Further reading and documentation
Frequently asked questions
Frequently asked questions
Access and login
Access and login
Can I register my colleague for data access?
Someone I know is a member of Research Network, but they haven't received login details yet - why?
Data
Data
How do I identify compound heterozygous mutations within the Genomics England Dataset?
How often are the data updated?
What is Rare Disease Tiering data?
Where can I find information on de novo variants?
Why can't I see all the genomes in the 'genomes/by_date' folder?
Why do some participants have multiple genomic data on the same reference assembly?
How do I know if a pathogenic variant with a high allele frequency is true or a variant calling error?
Tools
Tools
Can I bring in docker/container?
Can I install software in the Research Environment?
Cytoscape, How to start it up?
Cytoscape, How to use the API with R?
Is it possible to annotate variants within the Research Environment?
Is there an R package we can use to interrogate data from LabKey?
Problem loading IGV in the Research Environment
Vcf2maf on the HPC
Which version of BCFtools 1.10.2 should I use on the HPC?
LabKey curl_fetch_memory error
LabKey DOCTYPE error
libfortran errors when loading packages in R
Loading R packages when versions are not synchronised
Rpackage "CURL_OPENSSL_3" not found
Research
Research
Creating "lollipops" diagrams in the TRE
Reference genomes within the Research Environment
Using VEP with the LOFTEE plugin
If I think I have found a variant that I believe is pathogenic, who should I tell?
Data security and you
Tutorials
Tutorials
Upcoming live training
Past live training
Past live training
What tools and workflows should I use to fulfil an overall goal?, November 2023
Using GEL data for publications and reports, October 2023
Getting medical records for participants, August 2023
Finding participants based on genotypes, July 2023
Building rare disease cohorts with matching controls
Building cancer cohorts and survival analysis
New datasets in the RE, May 2023
Importing tools and data to use in the Research Environment, March 2023
Using the GEL Research Environment for clinical genetic diagnosis, February 2023
Introduction to the Research Environment, January 2023
Using the HPC to run jobs, November 2022
Getting medical histories for participants, September 2022
Finding participants based on genotypes, July 2022
Building a cohort based on phenotypes, May 2022
Introduction to the Research Environment, March 2022
Bioinformatics Clinics
Online tutorials
Online tutorials
Video tutorials
Building cohorts
Building cohorts
Building cohorts with Participant Explorer
Building cohorts with Cohort Browser in CloudOS
Building cancer cohorts programmatically
Building rare disease cohorts programmatically
Finding participants by genotype
Finding participants by genotype
Finding participants by genotype in IVA
Finding participants by genotype in Cohort browser in CloudOS
Using pre-built workflows to find participants by genotypes
Finding participants with prioritised variants programmatically
Querying aggregate VCF files to find participants by genotypes
Getting medical histories for participants
Getting medical histories for participants
Accessing and comparing medical history data with Participant Explorer
Accessing medical history data programmatically
Accessing mental health data programmatically
data
reference
Release V7 (25/07/2019)
ΒΆ
Release notes
Data dictionary
Last update:
November 3, 2023
Back to top