NHS GMS data release change summary¶
nhs-gms-release_v5_2025-08-28¶
New table¶
Cancer variant tiering information is now available in the cancer_tier_and_domain_variants table.
Changes to existing tables¶
The participant table now includes the column programme_consent_status. This field includes the up-to-date consent status (Consenting, Withdrawn (Partial) or Withdrawn (Full)) of the participant. When you start a new research project, you must filter your list of participants to remove any non-consenting participants.
The cancer_analysis table now includes a referral_type column, which describes whether a cancer case included both germline and somatic samples (matched_normal) or only the tumour sample (tumour_only). In this release all cases are matched normal.
nhs-gms-release_v4_2024-08-22¶
Updated 19/12/24 - New secondary clinical datasets¶
This release includes secondary clinical data, i.e. medical history, including NHSE Hospital Episode Statistics and Office of National Statistics mortality data. The information is provided in the following tables:
hes_ae: Hospital Episode Statistics Accident and Emergency; contains historic records of A&E attendanceshes_apc: Hospital Episode Statistics Admitted Patient Care; contains historic records of admissions into secondary care.hes_cc: Hospital Episode Statistics Critical Care; contains historic records of admissions into critical care.hes_op: Hospital Episode Statistics Outpatient; contains historic records of outpatient attendances.cancer_registry: medical information about the tumour.ecds: Main dataset of urgent and emergency care. Expands hes_ae and will replace it entirely in the future.mortality: Lists the Office of National Statistics' cause of death records.
More information on these datasets can be found in the Common clinical data page, or in the Data Dictionary.
Currently, this secondary data is only included for participants who were available in release 3, we do not have secondary data for participants who were added in release 4.
Changes to existing tables¶
cancer_analysis
Each participant only has one row per cancer case. An additional filter on the referral status is included in GMS data release v4 to exclude statuses that are not active. This fixes an issue in GMS data release v3 which caused duplicate tumour_uid to be present.
report_outcome_questionnaire
This table was previously called gmc_exit_questionnaire, and has been renamed as report_outcome_questionnaire so that it is more aligned with what the questionnaires are called in GMS.
LabKey UI datatype changes
There have been improvements to the datatypes in the LabKey UI for the following tables.
improved tables
| Table | Field | Previous datatype | Updated datatype |
|----------------|------------------------------|--------------------------------------------|-----------------------------------------|
| sample | collection_date | varchar | timestamp (`yyyy-MM-dd` format) |
| | din_value_glh | integer | decimal |
| | percentage_dna_glh | integer | decimal |
| panels_applied | panel_identifier | integer | varchar |
| tiering_data | father_affected | boolean | varchar |
| | mother_affected | boolean | varchar |
| exomiser | father_affected | boolean | varchar |
| | mother_affected | boolean | varchar |
| | poly_phen | varchar | decimal |
| | mutation_taster | varchar | decimal |
| | sift | varchar | decimal |
| av_patient | embarkation | boolean | varchar |
| sact | administration_date | varchar | timestamp (`yyyy-MM-dd` format) |
| | date_of_final_treatment | varchar | timestamp (`yyyy-MM-dd` format) |
| | chemo_radiation | varchar | boolean |
| | regimen_mod_stopped_early | varchar | boolean |
| | regimen_mod_time_delay | varchar | boolean |
| | start_date_of_cycle | varchar | timestamp (`yyyy-MM-dd` format) |
| | start_date_of_regimen | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| | date_decision_to_treat | varchar | timestamp (`yyyy-MM-dd` format) |
| rtds | proceduredate | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| | timeofexposure | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`HH:mm:ss` format) |
| | treatmentstartdate | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| | earliestclinappropriatedate | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| | decisiontotreatdate | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| | apptdate | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| av_treatment | eventdate | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| v_tumour | diagnosisdate1 | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| | diagnosisdate2 | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| | diagnosisdatebest | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| | statusofregistration | boolean | varchar |
| | breslow | varchar | decimal |
| | first_hosp_date | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
| | date_first_surgery | timestamp (`yyyy-MM-dd HH:mm:ss` format) | timestamp (`yyyy-MM-dd` format) |
nhs-gms-release_v3_2024-03-18¶
New data sets¶
This release includes secondary clinical data, i.e. medical history, from the National Cancer Registration and Analysis Service (NCRAS). The information is provided in the following tables:
av_imd: income deprivation domainav_patient: demographics from the Cancer Registration and information about deathav_rtd: routes to diagnosisav_treatment: treatment received for each participantav_tumour: medical information about the tumourrtds: radiotherapy datasetsact: systemic anti-cancer therapy
More information on these datasets can be found in the Cancer-specific clinical data page
Changes to existing tables¶
participant
The participant table now contains two additional fields:
Category: this provides the category given to the referral (Cancer/Rare Diseases)Referral id: this provides the id of the referral submitted to GMS
observation
The observation table now contains two additional fields:
Normalised Hpo IdNormalised Hpo Term
gmc_exit_questionnaire
The Additional Comments and Publications columns in the gmc_exit_questionnaire table now contain information. Personal Identifiable Data (PID) has been masked by replacing it with '---'.
nhs-gms-release_v2_2023-02-28¶
Some tables have been present in the 100K data and therefore follow a similar format. Based on the 100K format the following changes are present in similarly named tables of the NHS GMS data.
While various subtle changes have been made to the NHS GMS tables, we list some of the most important ones below. For example, with NHS GMS release v2 we have reintroduced the cancer_analysis table.
General Changes¶
- This release sees the introduction of partial referrals. Partial referrals are referrals for more than one participant, and for which only a proportion of the participants have consented for research. This may limit some of the data available for the participants within a partial referral who did consent for research. We are only releasing data for partial referral participants who did consent for research.
- We have had to make a change to our approach of encrypting participant, referral, and sample IDs. Therefore, you will unfortunately not be able to find the same IDs between NHS-GMS release v1 and v2. As the first release contained a relatively minimal dataset, we hope the impact of this change remains minimal. The majority of the participants included in NHS GMS release v1 will be part of release v2, but under different participant and referral IDs.
Bioinformatics data¶
genome_file_paths_and_types
- Structural Variant (SV) VCFs for rare disease participants (
*.diploidSV.vcf.gz) are now provided per individual instead of a single VCF containing SVs of individuals of a given family. As implied, in NHS-GMS release v1 these were still provided at a family basis, but due to the introduction of partial referrals we aimed to maintain the possibility to study SVs as much as possible.
cancer_analysis
- Introduction of
referral_idas a case reference ID. Participants can be part of multiple referrals. - NHS GMS columns
clinical_indication_codeandclinical_indication_full_namewill provide detailed information on the tumour type (also found in thereferraltable). - 100K column
tumour_idhas been replaced withtumour_uidfor NHS-GMS. Thetumour_uidwill enable the linking of tumour morphology and topography data across clinical tables. - 100K column
tumour_clinical_sample_timehas been replaced withtumour_sample_clinical_sample_date_timeand the germline equivalent added asgermline_sample_clinical_sample_date_time. However, this data is no longer submitted for every referral, so is absent for many samples. - NHS GMS columns
somatic_tinc_vcfandsomatic_tinc_sv_vcfare currently empty in the cancer_analysis table. This is not an error and is subject to change in future releases, but we decided to already include the column for this data. - 100K columns
analysis_csv_filepathandanalysis_html_filepathhave been replaced withcancer_report_reported_variants_csvandcancer_report_supplementary_html, respectively. In addition, we have now also provided the smaller summarised report in thecancer_report_htmlcolumn. - The annotated VCFs, csv's and html's can now be found in a single interpretation folder to increase visibility of data belonging to the same interpretation request.
- While we expect that this table will receive more additions to increase its utility, we look forward to suggestions from the Research Community as to what may be useful columns or information.
gmc_exit_questionnaire
* While no changes have been made to this table, we want to reiterate that the columns additional_comments and publications have been intentionally made NA in this release as well. This remains subject to change in future releases.
Clinical Data¶
Three new fields have been added to this release of the clinical datasets:
referral.date_submitted: this provides the date when the referral was first submitted to GMSplated_sample.date_of_dispatch: this provides the date when the plated sample was dispatched to the sequencing facilityreferral.category: this provides the category given to the referral (Cancer/Rare Diseases)- Several of the extraneous guid fields have been removed from this release of the clinical datasets, specifically:
condition.uidobservation_component.uidparticipant.uidreferral.uidreferral_participant.uidreferral_test.uidtumour_morphology.uidtumour_topography.uid
nhs-gms-release_v1_2022-06-15¶
This data release represents the baseline for subsequent releases.
Some tables have been present in the 100K data and therefore follow a similar format. Based on the 100K format the following changes are present in similarly named tables of the NHS GMS data.
- The
participant_id's have changed format and are now a string with the following logic:ppXXXXXXXXXXX sequencing_reportandgenome_file_paths_and_types- Column
family_idhas been removed. From now on, cases are referred to as referrals and family members will be part of a single referral. Effectively,referral_idreplacesfamily_id. - Column
laboratory_sample_idhas been removed and will not be available for NHS GMS data. - Discrepancy between
plate_keyvsplatekeyhas been streamlined. From now on, only references toplatekeyare used. - Column
associated_interpretation_request_idhas been included. From now on researchers will have a better view on which CRAM files have been used for a given interpretation request. - Joint-called VCFs are now readily available in
/gel_data_resources/and can be queried from either table. - Column
data_formathas been included. Within our pipeline, singletons will go through the same pipeline as multi-member families and are thus considered 'joint-called' even when it concerns a singleton. Samples called without other family members are marked assingle_samplein the data_format column. - More granularity has been provided in the
file_sub_typecolumn (i.e. more types). - Column
delivery_datehas been streamlined across the table and now only containsYYYY-MM-DD. Time stamps have been removed.
- Column
tiering_dataandexomiser- Columns
rare_diseases_family_idandfamily_idhave been removed. From now on, cases are referred to as referrals and family members will be part of a single referral. Effectively,referral_idreplaces the utility ofrare_diseases_family_idandfamily_id. - Discrepancy between
sample_idvsplatekeyhas been streamlined. From now on, only references toplatekeyare used. - Discrepancy between
genome_buildvsassemblyhas been streamlined. From now on, only references togenome_buildare used. - Columns
full_brothers_affectedandfull_sisters_affectedhave been removed. This has been replaced byfull_siblings_affectedand indicates the number of affected full siblings. - Column
participant_phenotypic_sexwill be NA in this release. This is subject to change in future releases.
- Columns
panels_applied- Column
rare_diseases_family_idhas been removed. From now on, cases are referred to as referrals and family members will be part of a single referral. Effectively,referral_idreplaces the utility offamily_id. - Discrepancy between
sample_idvsplatekeyhas been streamlined. From now on, only references toplatekeyare used.
- Column
tiered_variants_frequency- A large number of columns will not be available for the initial release. The primary reason is their unavailability (may change) in our backend systems as changes have been made between the 100K pipeline and the NHS GMS pipeline. This is subject to change in future releases.
gmc_exit_questionnaire- Column
family_idhas been removed. From now on, cases are referred to as referrals and family members will be part of a single referral. Effectively,referral_idreplaces the utility offamily_id. - Discrepancy between
genome_buildvsassemblyhas been streamlined. From now on, only references togenome_buildare used. - Columns
additional_commentsandpublicationshave been intentionally made NA in this release. This is subject to change in future releases. participant,plated_sampleandsample- A large number of the columns will not be available for this initial release. This is subject to change in future releases.
- Column
The data model for a number of the clinical tables is different to that in the 100,000 Genomes Project main programme releases. The below outlines where you would find the equivalent data in the main programme release.
condition,observationandobservation_component- Data found in these tables can be found in the main programme tables
rare_disease_participant_diseaseandrare_disease_participant_phenotype
- Data found in these tables can be found in the main programme tables
referralandreferral_participant- For NHS GMS, cases are referred to as referrals and family members will be part of a single referral. Effectively,
referral_idreplaces the utility offamily_idand the referral tables replace the utility of therare_disease_pedigree,rare_disease_pedigree_memberandrare_disease_familytables - The concept of
pedigree_memberdoesn't exist in NHS GMS, only data on currently consented individuals is included
- For NHS GMS, cases are referred to as referrals and family members will be part of a single referral. Effectively,