Accessing medical history data programmatically¶
Give us feedback on this tutorial
There are a huge number of tables that you can use to query medical history. This tutorial will take you through some of the tables and give you some example scripts that you can use in python or R to access the data, or using R in an interactive session on CloudOS, to access the data.
The advantage of querying medical history programmatically is that the underlying tables are exposed, allowing you to verify the data in bulk. This also allows you to do very complex queries. You can save and tweak scripts to re-use in new releases or for similar queries.
Tables to query¶
To access medical history, you might query the following tables:
Table | Full name | Details | Programme availability |
---|---|---|---|
hes_apc |
Hospital episode statistics - Admitted patient care | Details of any overnight stays in hospital | 100kGP and NHS GMS |
hes_op |
Hospital episode statistics - Outpatients | Details of any planned day appointments in hospital | 100kGP and NHS GMS |
hes_ae |
Hospital episode statistics - Accident and emergency | Details of unplanned walk-in/ambulance visits to hospital | 100kGP and NHS GMS |
hes_cc |
Hospital episode statistics - Critical care | Details of any time spent on life support | 100kGP and NHS GMS |
ecds |
Emergency care dataset | Details of unplanned walk-in/ambulance visits to hospital | 100kGP and NHS GMS |
did |
Diagnostic imaging metadata | Details of any medical imaging | 100kGP only |
The tables for medical history are extensive and were designed for medical record keeping, not for scientific research. They may have a lot of fields not relevant to research, or that are confusing. We recommend referring to the latest data dictionaries for 100kGP and NHS GMS for full details of these tables.
For your convenience, concatenations of some of the diagnosis and treatment columns, can make them easier to query. All the example code makes use of these concatenated columns, rather than the source data.
Diagnosis and treatment columns
Table | Concatenated Column Name | Source Columns | Description | Codes used |
---|---|---|---|---|
ecds |
diagnosis_code_all |
diagnosis_code_01 - diagnosis_code_12 |
Diagnoses | SNOMED CT |
ecds |
diagnosis_qualifier_all |
diagnosis_qualifier_01 - diagnosis_qualifier_12 |
Diagnosis qualifiers | SNOMNED CT |
ecds |
investigation_code_all |
investigation_code_01 - investigation_code_12 |
Investigations | SNOMED CT |
ecds |
treatment_code_all |
treatment_code_01 - treatment_code_12 |
Treatments | SNOMED CT |
hes_apc |
diag_all |
diag_01 - diag_20 |
Diagnoses | ICD-10 |
hes_apc |
opertn_all |
opertn_01 - opertn_24 |
Treatments | OPCS |
hes_ae |
diag_all |
diag_01 - diag_12 |
Diagnoses | ICD-10 |
hes_ae |
diag2_all |
diag2_01 - diag2_12 |
Diagnoses in two characters | Refer to the 100kGP and NHS GMS data dictionaries |
hes_ae |
diaga_all |
diaga_01 - diaga_12 |
Diagnosis anatomical area | Refer to the 100kGP and NHS GMS data dictionaries |
hes_ae |
diags_all |
diags_01 - diags_12 |
Diagnosis anatomical site | Left, Right, Bilateral, not applicable |
hes_ae |
invest_all |
invest_01 - invest_12 |
Investigations | 6an |
hes_ae |
invest2_all |
invest2_01 - invest2_12 |
Investigations | Refer to the 100kGP and NHS GMS data dictionaries |
hes_ae |
treat2_all |
treat2_01 - treat2_12 |
Treatment two-letter codes | Refer to the 100kGP and NHS GMS data dictionaries |
hes_ae |
treat_all |
treat_01 - treat_12 |
Treatments | 6an |
hes_op |
diag_all |
diag_01 - diag_12 |
Diagnoses | ICD-10 |
hes_op |
opertn_all |
opertn_01 - opertn_24 |
Treatments | OPCS |
Links between tables
Some of the tables have fields which link them together, allowing you to see how medical events relate to each other. However many of the tables can only be linked through participant IDs and corresponding dates.
Tables | Column(s) linking them | Details |
---|---|---|
hes_apc and hes_ae |
aekey and epikey |
Where an accident and emergency visit results in the participant being admitted to an overnight stay |
hes_apc and hes_cc |
susrecid |
Where a participant spends some time on life-support during an overnight stay |
Dates in tables
Many of the tables have multiple date fields, signifying different events. We have highlighted that fields we think are the most useful.
Table | Field | Meaning | Most useful date in the table? |
---|---|---|---|
did |
did_date1 |
The date of referral | |
did |
did_date2 |
The date the referral was received | |
did |
did_date3 |
The date of imaging | yes |
did |
did_date4 |
The date of the report | |
hes_ae |
appdate |
Record updates | |
hes_ae |
arrivaldate |
the date of arrival | yes |
hes_ae |
cdsextdate |
Record updates | |
hes_ae |
subdate |
Record updates | |
hes_ae |
suslddate |
Record updates | |
hes_apc |
apcend_1 - apcend_9 |
End date of periods of augmented care | |
hes_apc |
apcstar_1 - apcstar_9 |
Start date of periods of augmented care | |
hes_apc |
admidate |
Date of admission | yes |
hes_apc |
anasdate |
Date of first antenatal assessment | |
hes_apc |
cdsextdate |
Record updates | |
hes_apc |
disdate |
Date of discharge | yes |
hes_apc |
disreadydate |
Date the participant was medically ready for discharge | |
hes_apc |
elecdate |
Date of decision to admit | |
hes_apc |
epiend |
Date of end of care under a particular consultant | |
hes_apc |
epistart |
Date of start of care under a particular consultant | |
hes_apc |
opdate_01 - opdate_24 |
Date of corresponding numbered treatment in columns oper_01 - oper_24 |
|
hes_apc |
rtt_per_end |
End date of a referral treatment period | |
hes_apc |
rtt_per_start |
Start date of a referral treatment period | |
hes_apc |
subdate |
Record updates | |
hes_apc |
suslddate |
Record updates | |
hes_cc |
ccdisdate |
Date of discharge | yes |
hes_cc |
ccstartdate |
Date of start | yes |
hes_op |
apptdate |
Date of appointment | yes |
hes_op |
dnadate |
Date of missed appointment | |
hes_op |
reqdate |
Date referral received | |
hes_op |
rtt_per_end |
End date of a referral treatment period | |
hes_op |
rtt_per_start |
Start date of a referral treatment period | |
hes_op |
subdate |
Record updates | |
hes_op |
suslddate |
Record updates | |
ecds |
arrival_date |
Date of arrival | yes |
ecds |
assessment_date |
Date of assessment | yes |
ecds |
conclusion_date |
Date of conclusion of treatment in accident and emergency | |
ecds |
departure_date |
Date of leaving accident and emergency | yes |
ecds |
injury_date |
Date of injury | yes |
ecds |
investigation_date_01 - investigation_date_12 |
Date of corresponding numbered investigation in columns investigation_code_01 - investigation_code_12 |
|
ecds |
period_end_date |
End date of referral to treatment | |
ecds |
period_start_date |
Start date of referral to treatment | |
ecds |
referral_assessment_date_1 - referral_assessment_date_4 |
Date of corresponding numbered referral in columns referred_to_service_1 - referred_to_service_4 |
|
ecds |
service_request_date_1 - service_request_date_4 |
Date of corresponding numbered referral in columns referred_to_service_1 - referred_to_service_4 |
|
ecds |
treatment_date_01 - treatment_date_10 |
Date of corresponding numbered treatment in columns treatment_code_01 - treatment_code_10 |
Setting up access¶
Before you get started with using the LabKey APIs, you need to ensure you have it set up correctly, following this document. We recommend running your LabKey API scripts on the HPC, as detailed here, as they can be quite large.
Before you get started with using the LabKey APIs, you need to ensure you have it set up correctly, following this document. We recommend running your LabKey API scripts on the HPC, as detailed here, as they can be quite large.
To query our tables using CloudOS, you will need to set up an interactive session. You will need to use conda to install the required R packages:
conda create -n query-tables r-glue r-tidyverse r-data.table r-dbi r-rpostgres r-irkernel -y
conda activate query-tables
You can now launch an R notebook under the query-tables
kernel. More details
Contents:¶
- Import modules/libraries you need
- Helper function to access the LabKey API
tidyup
function for making diagnosis and treatment columns for readable- Get details from a single
hes_*
table orecds
for a participant - Plot events in medical history over time
- Match accident and emergency episodes to overnight admissions
- Find critical care periods within an overnight stay
- Pull out time periods from medical history tables
- Find events and episodes within a time period
Import modules/libraries you need ¶
LabKey has an underlying SQL database. To access it you will need to have the labkey module/library loaded. You will also be working with tables and dataframes. Here you can see the modules/libraries you should load in your scripts:
If you are using R version 3 or above, you should also add the following line:
Helper function to access the LabKey API ¶
We have here a helper function called labkey_to_df
that you can use to access the LabKey API in the RE and return data as a dataframe. You need to give it:
sql_query
: an SQL query to access the LabKey tablesdatabase
: the version of the database you want to access for example/main-programme/main-programme_v19_2024-10-31
maxrows
: maximum number of rows you want to return. This parameter defaults to 100000000, but if the output contains exactly the default value, then we suggest increasing this value.
Feel free to include this function in all your scripts and invoke it every time you want to query the data.
To run my queries, I'll need to set up my database version:
tidyup
function for making diagnosis and treatment columns more readable¶
Many of the queries presented in this tutorial will pull up columns of concatenated data, such as diag_all
and opertn_all
. These are presented in the form:
code1|code2|code3||||||||
In this example above, code1
etc represent an ICD10 or opcs code. Each vertical pipe |
represents one of the columns that were concatenated together; where these columns were empty, there are no spaces between the pipes.
The following function uses a regex to tidy up these to comma separated values, deleting any empty fields. It takes a dataframe as input and returns the tidied-up dataframe. You can use this any time you want to view the output of these analyses.
Get details from a single hes_*
table or ecds
for a participant¶
We're going to find all diagnoses for an individual from a hes_*
table. In this case we're going to query hes_ae
. If you want to query hes_apc
or hes_op
, you would need to change the date field to one(s) available in those tables; you might also want to add further diagnosis and treatment columns.
In this and all following code we will refer to a string part_id
, which is the participant_id of the participant of interest. Assume that you have already defined this string in your code.
The table ecds
covers emergency admissions after 2017. The ecds
table has different column headings, so we have to alter our queries accordingly. The following query gets all the diagnoses, treatments and investigations for a participant, with the date of arrival.
Plot events in medical history over time ¶
The following code allows you to pull out all events from a participants medical history, and display them in a single plot.
Find all diagnoses¶
First, we will pull out all diagnoses from the hes_*
and ecds
tables. Dates in all these tables are coded differently.
Instead of using our tidyup function, we want to split the table to have one diagnosis per row
Now we can plot all diagnoses over time.
Find all investigations¶
To find all investigations, we will search the hes_ae
, did
(100kGP only) and ecds
tables.
We can process and plot this in the same way as we did for the diagnoses.
Find all treatments¶
Finally, we can do the same for treatments, fetching from the hes_ae
, hes_apc
, hes_op
and ecds
tables, and processing and plotting in the same way.
Match accident and emergency episodes to overnight admissions¶
Where accident and emergency admissions resulted in the participant being admitted for overnight care, the entries in the hes_ae
and hes_apc
tables are linked by the aekey
and epikey
. The following query finds all diagnoses and operations the participant received in both A&E and after admission to overnight care. Note that in this code we rename the column headers, since many of these are the same between hes_ae
and hes_apc
.
Find critical care periods within an overnight stay¶
Critical care periods in the hes_cc
table all fall within periods of overnight care in hes_apc
. These are linked together by the susrecid
column.
Pull out time periods from medical history tables¶
Other tables, such as the hes_op
table of Outpatients day appointments at a hospital, and did
table of diagnostic imaging, do not have explicit links to other tables. These can only be linked to other medical history tables by the dates of events.
The following code pulls out the start and end dates of periods of admitted patient care where a diagnosis was made (in this case R06, abnormalities of breathing). We will pull dates out of the table apc_cc_query
that we created above.
Alternatively, you may wish to take a start or end date from one event and find all other events within a period of time before or after that event.
Find events and episodes within a time period¶
We can use the dates we have pulled out to find all events in other tables that occurred within that time period. In the following query, we will query the did
table of diagnostic imaging in 100kGP for rows where the imaging date did_date3
falls within the hospital stay.
To match up rows from did
to rows from our hes_*
tables, we will create our own index called did_index
.