Pathology synoptic reports dataset¶
Synoptic reports dataset consolidates the efforts of a team of pathologists manually extracting medical entities from free-text pathology reports (collected by and received from the National Disease Registration Service) into versioned, structured document following the guidelines set by the Royal College of Pathologists (RCPath). These reports use a synoptic proforma, meaning they present pathology findings in a standardised checklist-style format rather than as free-text descriptions.
Structure¶
The dataset covers 8370 participants primarily of Colorectal and Breast Cancer cohorts.
Each row in the dataset represents a section value taken from a report section for a given case and participant. Reports are versioned and carry a specific report name , reflecting the specimen or tissue type.
Example¶
| participant_id | CaseID | ReportName | ReportVersion | SectionPathLabel | SectionValue |
|---|---|---|---|---|---|
| P001 | P001-1 | Breast Excision (RCPath) | 9.2.1702.89 | Hidden Registry Fields ::HER 2 Status | HER 2 Status |
| P002 | P002-1 | Breast Core Biopsy (RC Path) | 9.2108.0.58 | Invasive Carcinoma | Invasive Carcinoma |