Skip to content

Staging data (cancer)

The 100kGP cancer_staging_consolidated LabKey table compiles in one place all the staging information available in the Genomics England research environment. It contains the subset of participants from the cancer programme who have successfully passed through the Genomics England interpretation pipeline (and are available in the cancer_analysis LabKey table) for whom at least one piece of staging information is found.

Please note that cancer_staging_consolidated contains no new staging information, i.e. no information that is not already in other LabKey tables.

Description

Staging information is located in the Research Environment in the following three datasets:

  • cancer_participant_tumour (Genomic England primary clinical data)
  • av_tumour (secondary clinical data from NCRAS)
  • sact (secondary clinical data from NHSE)

These datasets have different levels of completion for the staging information. In addition, all tables on LabKey are linked via participant_id, which in the case of cancer staging data is not sufficient, since one participant can have multiple tumours and stage will evolve with time. In order to make staging information easily accessible, we have put together, in a single table, the staging information found on the datasets above for each tumour sample in cancer_analysis.

Tumour_id made it possible to link samples with our primary clinical data; however, not all samples had a tumour_id available. In these cases, as well as for the secondary clinical data, samples have been linked using a dictionary that correlates ICD-10 codes found in the clinical data and disease_type of cancer_analysis. The dictionary was create internally and validated by one of our pathologists.

Finally, we only include staging information in the cancer_staging_consolidated table when the available clinical stage information has been collected no more than one year (12 months) from the date when the tumour sample was collected. If you would like to use a smaller window, you can do so by filtering on column "interval_min" of cancer_staging_consolidated table (please note that the interval_min is counted in days). If for a tumour sample there are multiple staging information available within the one year window, only one entry per source dataset (cancer_participant_tumour, av_tumour, sact) will be included: the staging information that was obtained closer to the date when the tumour sample was collected. For sact data, we link samples using participant_id and disease_type and use the starting date of regimen; if there is a match (via disease_type), we ensure that the starting date of regimen and the date the tumour sample was taken are no more than one year apart.

Staging and grading information available

Staging type Column header Definition Cancer type Further information
TNM integrated_tnm_stage_grouping The overall integrated TNM stage grouping indicates the tumour stage after treatment and/or after all available evidence has been collected. Any Link
component_tnm_t Tumour stage, if integrated TNM not supplied. This is the UICC code which classifies the size and extent of the primary tumour after treatment and/or after all available evidence has been collected. Any Link
component_tnm_n Nodes stage, if integrated TNM not supplied. This is the UICC code which classifies the absence or presence and extent of regional lymph node metastases after treatment and/or after all available evidence has been collected Any Link
component_tnm_m Metastasis stage, if integrated TNM not supplied. This is the UICC code which classifies the absence or presence of distant metastases after treatment and/or after all available evidence has been collected. Any Link
t_best The best tumour stage out of t_path and t_img, based on the shortest time-lapse after diagnosis. Any Link
n_best The best nodes stage out of n_path and n_img, based on the shortest time-lapse after diagnosis. Any Link
m_best The best metastasis stage out of m_path and m_img, based on the shortest time-lapse after diagnosis. Any Link
t_path Tumour stage, determined from pathology data Any Link
n_path Nodes stage, determined from pathology data Any Link
m_path Metastasis stage, determined from pathology data Any Link
t_img Tumour stage, determined from image data Any Link
n_img Nodes stage, determined from image data Any Link
m_img Metastasis stage, determined from image data Any Link
AJCC ajcc_stage American Joint Committee on Cancer staging of tumour at diagnosis. Skin cancer
FIGO figo Fédération Internationale de Gynécologie et d’Obstétrique staging ovarian, endometrial, cervical, vaginal and vulval cancer Link
final_figo_stage FIGO stage following surgery for uterine and vulval malignancies and for ovarian malignancies undergoing primary surgery. For ovarian malignancies planned to undergo neoadjuvant chemotherapy and for cases of cervical cancer (which is staged clinically), the final FIGO stage is determined at the time of review of clinical findings, imaging, cytology and biopsy histology. ovarian, endometrial, cervical, vaginal and vulval cancer Link
Dukes dukes Dukes' stage Bowel cancer Link
modified_dukes_stage Dukes' stage of disease at diagnosis (based on pathological evidence but upgraded to Dukes D if clinical evidence of metastasis) Dukes D should be recorded if metastatic spread is identified either in the preoperative staging process, e.g. on CT scanning, MRI, USS, chest x-ray or at the time of operation. It is accepted that a small number of D cases are cured by further treatment such as liver resection, but for COSD metastatic spread distant from the primary should always be recorded as D. Bowel cancer Link
Stage stage_best Best ‘registry’ stage at diagnosis of the tumour All
stage_best_system System used to record best registry stage at diagnosis All
stage_path Stage based on pathology All
stage_img Stage based on imaging All
Gleason gleason_primary Gleason primary pattern Prostate cancer Link
gleason_combined Combined Gleason primary and secondary scores Prostate cancer Link
Grade grade Grade of Differentiation, how abnormal the cancer cells are Any Link
Oestrogen receptor status er_status Low levels of oestrogen receptor Breast cancer Link
Progesterone receptor status pr_status Low levels of progesterone receptor Breast cancer Link
HER2 status her2_status Elevated levels of human epidermal growth factor 2 Breast cancer Link
Nottingham Prognostic Index npi a calculation of the probability of success of surgery for breast cancer Breast cancer Link

Workflow

Location

This information can be found in LabKey under a tabled called cancer_staging_consolidated under the Bioinformatics tab. The cancer_staging_consolidated table connects to other tables in LabKey via the participant_id. In addition, tumour identifiers from different sources, i.e. tumour_id (Genomics England), av_tumour_pseudo_id (NCRAS) and sact_tumour_pseudo_id (NHSE) are given to identify the specific tumour.

Table schema

The cancer_staging_consolidated table contains the following entries:

from cancer_analysis:

  • participant_id
  • tumour_sample_platekey
  • tumour_id
  • disease_type
  • tumour_type
  • tumour_clinical_sample_time

from cancer_participant_tumour:

  • diagnosis_date
  • diagnosis_icd_code
  • integrated_tnm_stage_grouping
  • component_tnm_t
  • component_tnm_n
  • component_tnm_m
  • ajcc_stage
  • final_figo_stage
  • modified_dukes_stage

from av_tumour:

  • av_tumour_pseudo_id
  • diagnosisdatebest
  • site_icd10_o2
  • stage_best
  • t_best
  • n_best
  • m_best
  • stage_best_system
  • stage_path
  • t_path
  • n_path
  • m_path
  • stage_img
  • t_img
  • n_img
  • m_img
  • dukes
  • figo
  • gleason_primary
  • gleason_combined
  • grade
  • behaviour_coded_desc
  • histology_coded_desc
  • er_status
  • pr_status
  • her2_status
  • npi

from sact:

  • sact_tumour_pseudo_id
  • primary_diagnosis
  • start_date_of_regimen
  • stage_at_start

calculated:

  • interval_min

Cancer staging statistics

The statistics for the cancer_staging_consolidated table for Cancer staging V8 data release can be found here: Cancer staging V8 Statistics (28-11-2019)

The statistics for the cancer_staging_consolidated table for Cancer staging V9 data release can be found here: Cancer staging V9 Statistics (02-04-2020)

The statistics for the cancer_staging_consolidated table for Cancer staging V10 data release can be found here: Cancer staging V10 Statistics (03-09-2020)

The statistics for the cancer_staging_consolidated table for Cancer staging V11 data release can be found here: Cancer staging V11 Statistics (17-12-2020)

The statistics for the cancer_staging_consolidated table for Cancer staging V12 data release can be found here: Cancer staging V12 Statistics (06-05-2021)

The statistics for the cancer_staging_consolidated table for Cancer staging V13 data release can be found here: Cancer staging V13 Statistics (30-09-2021)

The statistics for the cancer_staging_consolidated table for Cancer staging V14 data release can be found here: Cancer staging V14 Statistics (27-01-2022)

The statistics for the cancer_staging_consolidated table for Cancer staging V15 data release can be found here: Cancer staging V15 Statistics (26-05-2022)

The statistics for the cancer_staging_consolidated table for Cancer staging V16 data release can be found here: Cancer staging V16 Statistics (13-10-2022)

The statistics for the cancer_staging_consolidated table for Cancer staging V17 data release can be found here: Cancer staging V17 Statistics (30-03-2023)

The statistics for the cancer_staging_consolidated table for Cancer staging V18 data release can be found here: Cancer staging V18 Statistics (21-12-2023)

The statistics for the cancer_staging_consolidated table for Cancer staging V19 data release can be found here: Cancer staging V19 Statistics (31-10-2024)

Feedback

This table has been included for the first time in data release 8. If you have suggestions about this table or would like to request edits that would be useful for your analyses, please let us know by contacting us via the Genomics England Service Desk.