Building cohorts¶
These tutorial will take you through the methods you can use to build cohorts using phenotypes. You can use this information to build tables of participants who share particular phenotypes and extract information such and their identifiers, genome file locations, phenotypes and covariates such as age, sex and ethnicity.
You can search for participants using a no code interface in Participant Explorer, or using the tables in LabKey and its associated APIs. These tutorials will cover using these and some of the methods required.
Parameters for cohort building¶
To build cohorts for cancer you will be using the cancer participants and may want to consider:
- Disease type
- Recruited disease
- Diagnosis codes in health records
- Staging
- Treatment
- Hormone status
For rare disease cohorts, you will use the rare disease participants and might look at:
- Recruited disease
- HPO terms associated with the participant's phenotype
- Diagnosis codes in health records
- If the case has been solved
- If the participant is still alive
Common disease cohorts can be constructed from rare disease relatives and cancer participants, considering:
- Diagnosis codes in health records
In all cases you want to build a control cohort. To ensure that you don't accidentally find any cases in your control group, we recommend filtering for your control by excluding those in the super-categories of the more specific categories you use for your case cohort.
For any of these cohort types, you may also want to take into account:
- Sex
- Ethnicity
- Age
- now
- at sampling
- at diagnosis
- at death
Cohort building methods¶
You can build cohorts in the RE using Participant Explorer or using the LabKey API, or in CloudOS using Cohort browser or interactive sessions. The following tutorials cover:
- Building cohorts with Participant Explorer
- Building cohorts with Cohort Browser in CloudOS
- Building cancer cohorts programmatically
- Building rare disease cohorts programmatically
Common disease cohorts
To build common disease cohorts programmatically, we recommend you follow the steps in the rare disease cohort tutorial, omitting the steps on recruited disease, HPO terms and unsolved cases.
Exporting cohort data¶
All the methods for creating cohorts listed here involve pulling out identifiable participant data, such as participant IDs and medical history. Therefore, you cannot export any of these tables from the RE. Cohorts created here are intended as a start point for further analyses. Any attempts to export these tables via Airlock will be rejected; you must not copy any of these tables by hand.
Recorded training sessions¶
You can also find training sessions:
Topic | Most recent session | Link | Notebook location in RE |
---|---|---|---|
Building cancer cohorts and survival analysis | 12th March 2024 | materials | /gel_data_resources/example_scripts/workshop_scripts/cancer_cohort_2024 |
Building rare disease cohorts with matching control | 20th June 2023 | materials | /gel_data_resources/example_scripts/workshop_scripts/rare_disease_cohort_20230620 |