Skip to content

Pathology synoptic reports dataset

Synoptic reports dataset consolidates the efforts of a team of pathologists manually extracting medical entities from free-text pathology reports (collected by and received from the National Disease Registration Service) into versioned, structured document following the guidelines set by the Royal College of Pathologists (RCPath). These reports use a synoptic proforma, meaning they present pathology findings in a standardised checklist-style format rather than as free-text descriptions.

Structure

The dataset covers 8370 participants primarily of Colorectal and Breast Cancer cohorts.

Each row in the dataset represents a section value taken from a report section for a given case and participant. Reports are versioned and carry a specific report name , reflecting the specimen or tissue type.

Example

participant_id CaseID ReportName ReportVersion SectionPathLabel SectionValue
P001 P001-1 Breast Excision (RCPath) 9.2.1702.89 Hidden Registry Fields ::HER 2 Status HER 2 Status
P002 P002-1 Breast Core Biopsy (RC Path) 9.2108.0.58 Invasive Carcinoma Invasive Carcinoma