Skip to content

Pathology reports for cancer participants

Pathology investigations are the gold standard for diagnosing and characterising cancers. Pathology reports allow you to see the thinking and detailed findings of the imaging investigations (pathology slides) in the context of the patient timeline.

We previously released pathology reports using an older anonymisation pipeline. We have new data received in late 2024 which includes more reports covering thousands more participants and a newer pipeline which significantly reduces over-redaction of clinically useful information, which means there is more data of better quality.

Data context and applications

We have seen researchers reading these reports and creating their own labels or patient timelines for their analyses, or using them to check that their classification from other clinical data sources. If you need to create labels across a large participant cohort, consider raising a Service Desk ticket as there is labelling expertise in-house that can automate the process while allowing you to check any results for yourself.

If you are using the pathology imaging dataset, you should identify the corresponding report for the pathology case. The set of pathology images for a participant usually corresponds to one report but may correspond to multiple. The report will give valuable context, such as the date, cause and findings of the investigation, as well as block keys that allow the researcher to identify which slides contain tumour, lymph nodes or other tissue, as well as the presence of stains.

If you are using the older dataset and wish to use or compare the newer data, please use the gel_report_id field, which if it matches a record in the older dataset should refer to the same report.

Data dictionary

column data type example notes
gel_report_id string report_0006_000056 Unique identifier of each row/report.
participant_id string 111000055 Use this field to link to other tables across the Research Environment
received_date datetime64[ns] 2021-03-24 This date is a combination of "collected date" when available, followed by "received date" and finally "authorised date".
redacted_text string CLINICAL DETAILS:\nRight central excision of n... The redacted (anonymised) report.