Building cohorts¶

These tutorial will take you through the methods you can use to build cohorts using phenotypes. You can use this information to build tables of participants who share particular phenotypes and extract information such and their identifiers, genome file locations, phenotypes and covariates such as age, sex and ethnicity.

You can search for participants using a no code interface in Participant Explorer, or using the tables in LabKey and its associated APIs. These tutorials will cover using these and some of the methods required.

Parameters for cohort building¶

To build cohorts for cancer you will be using the cancer participants and may want to consider:

Disease type
- Recruited disease
- Diagnosis codes in health records
Staging
Treatment
Hormone status

For rare disease cohorts, you will use the rare disease participants and might look at:

Recruited disease
HPO terms associated with the participant's phenotype
Diagnosis codes in health records
If the case has been solved
If the participant is still alive

Common disease cohorts can be constructed from rare disease relatives and cancer participants, considering:

Diagnosis codes in health records

In all cases you want to build a control cohort. To ensure that you don't accidentally find any cases in your control group, we recommend filtering for your control by excluding those in the super-categories of the more specific categories you use for your case cohort.

For any of these cohort types, you may also want to take into account:

Sex
Ethnicity
Age
now
at sampling
at diagnosis
at death

Cohort building methods¶

You can build cohorts in the RE using Participant Explorer or using the LabKey API, or in CloudOS using Cohort browser or interactive sessions. The following tutorials cover:

Common disease cohorts

To build common disease cohorts programmatically, we recommend you follow the steps in the rare disease cohort tutorial, omitting the steps on recruited disease, HPO terms and unsolved cases.

Exporting cohort data¶

All the methods for creating cohorts listed here involve pulling out identifiable participant data, such as participant IDs and medical history. Therefore, you cannot export any of these tables from the RE. Cohorts created here are intended as a start point for further analyses. Any attempts to export these tables via Airlock will be rejected; you must not copy any of these tables by hand.

Recorded training sessions¶

You can also find training sessions:

Topic	Most recent session	Link	Notebook location in RE
Building cancer cohorts and survival analysis	12th March 2024	materials	`/gel_data_resources/example_scripts/workshop_scripts/cancer_cohort_2024`
Building rare disease cohorts with matching control	14th May 2024	materials	`/gel_data_resources/example_scripts/workshop_scripts/rd_cohort_2024`