Skip to content

Building cohorts

These tutorial will take you through the methods you can use to build cohorts using phenotypes. You can use this information to build tables of participants who share particular phenotypes and extract information such and their identifiers, genome file locations, phenotypes and covariates such as age, sex and ethnicity.

You can search for participants using a no code interface in Participant Explorer, or using the tables in LabKey and its associated APIs. These tutorials will cover using these and some of the methods required.

Parameters for cohort building

To build cohorts for cancer you will be using the cancer participants and may want to consider:

  • Disease type
    • Recruited disease
    • Diagnosis codes in health records
  • Staging
  • Treatment
  • Hormone status

For rare disease cohorts, you will use the rare disease participants and might look at:

  • Recruited disease
  • HPO terms associated with the participant's phenotype
  • Diagnosis codes in health records
  • If the case has been solved
  • If the participant is still alive

Common disease cohorts can be constructed from rare disease relatives and cancer participants, considering:

  • Diagnosis codes in health records

In all cases you want to build a control cohort. To ensure that you don't accidentally find any cases in your control group, we recommend filtering for your control by excluding those in the super-categories of the more specific categories you use for your case cohort.

For any of these cohort types, you may also want to take into account:

  • Sex
  • Ethnicity
  • Age
  • now
  • at sampling
  • at diagnosis
  • at death

Cohort building methods

You can build cohorts in the RE using Participant Explorer or using the LabKey API, or in CloudOS using Cohort browser or interactive sessions. The following tutorials cover:

Common disease cohorts

To build common disease cohorts programmatically, we recommend you follow the steps in the rare disease cohort tutorial, omitting the steps on recruited disease, HPO terms and unsolved cases.

Exporting cohort data

All the methods for creating cohorts listed here involve pulling out identifiable participant data, such as participant IDs and medical history. Therefore, you cannot export any of these tables from the RE. Cohorts created here are intended as a start point for further analyses. Any attempts to export these tables via Airlock will be rejected; you must not copy any of these tables by hand.

Recorded training sessions

You can also find training sessions:

Topic Most recent session Link Notebook location in RE
Building cancer cohorts and survival analysis 12th March 2024 materials /gel_data_resources/example_scripts/workshop_scripts/cancer_cohort_2024
Building rare disease cohorts with matching control 20th June 2023 materials /gel_data_resources/example_scripts/workshop_scripts/rare_disease_cohort_20230620