cancer_analysis table contains one entry for every sequenced tumour sample with results of the Genomics England interpretation pipeline. When the samples have been sequenced and had variants called by the Illumina pipeline Genomics England passes the samples through its interpretation pipeline, which will apply further QC and annotate on the called variants and perform analyses, such as estimating tumour mutation burden and compute mutational signatures.
Samples are uniquely identified by their
tumour_sample_platekey number, and matched to the information of their germline, as well as disease type, quality control measures, tumour mutational burden (
somatic_coding_variants_per_mb), signatures and path to bam and vcf files. Note that one participant may have more than one tumour sample, for the same or different tumours.
TGCA study and histology¶
Details of histology codes and TGCA studies are included. Histology codes are taken directly from the
av_tumour table table, while TGCA studies are deduced based on the
av_tumour histology codes or, if these are not available, ICD10 codes in
hes_apc. To help you understand the quality of this information, the difference in dates between
hes_apc and the sampling date is included, along with a
match_rank code indicating how well these tables match. Details of this analysis are found in this document
Tumour mutational burden (TMB)¶
For each tumour sample, TMB is calculated as the total number of non-synonymous small somatic variants divided by the total length of coding sequence (32.61 Mb). Small somatic variants are somatic SNVs and indels smaller than 50 bp. TMB is found on the
Somatic mutational signatures¶
Somatic mutational are the consequence of multiple mutational processes that the human body is subjected to throughout life. Each different process generates a unique combination of mutation types that are called mutation signatures. Genomics England computes mutational signatures using the R package nnls. For further information on how the signatures are computed, check Alexandrov et al, 2013.
For more information on QC metrics and how variants that are called for the tumour sample, please refer to the Cancer Analysis Technical Information Document.
Small variant annotation¶
SNVs and small indels were normalised (left aligned, trimmed, multi-allelic variants decomposed), annotated using Cellbase with the Ensembl (version 90/GRCh38), COSMIC (version v86/GRCh38) and ClinVar (October 2018 release) databases. CellBase takes advantage of the data integrated in its database to implement a rich and high-performance variant annotator (with 99.9991% concordance with Ensembl VEP Consequence Types across 1000 genomes phase 3 variants). Only variants annotated with the following consequence types in canonical transcripts (see List of canonical transcripts v1.10) are reported:
|SO term||Consequence type|
A note on germline flagged variants¶
We have observed that in some germline VCFs pathogenic variants have been called, and that these are present in many of the cancer participants. We have provided a list of known germline variants that are filtered out in our own clinical tiering pipelines. You can find more information and the whereabouts of this list on this page: faq.
This information can be found in LabKey under a table called
cancer_analysis under the quick view tab.
cancer_analysis connects to other tables via the
participant_id. If divergences are found between
cancer_analysis and other tables, the data on the former is the most reliable one, since the interpretation pipeline assures validation.