I'm interested in a gene and I want to know what phenotypes are related¶
How to use this page
Below you can switch between three categories: no code, existing tools and from scratch. Please select the version that matches your skills and the scale of the task you want to do.
|no code||small||basic IT skills||Uses no code tools in the RE||Clinicians and biologists without coding or command line skills|
|existing tools||large||command line and limited coding||uses pipelines generated in-house to carry out standardised analyses||bioinformaticians/computational biologists doing standard analyses|
|from scratch||large||command line, coding and common bioinformatics tools||illustrates the steps you might follow using common bioinformatics tools to carry out custom analyses||bioinformaticians/computational biologists doing custom analyses|
The instructions in each section include links to the relevant pages in the documentation. Links are tagged as:
- Tools - descriptive
- Data - descriptive
- Pre-made workflows
- Reference lists/tables
The IVA variant browser allows you to search for variants by various filters, including gene and region. Look at:
Find phenotypes associated with participants¶
You can search for participants by ID using Participant Explorer, and find phenotypes associated with them. Have a look at:
- Participant Explorer
- Search for participants
- Accessing and comparing medical history data with Participant Explorer
You can also explore participants and their phenotypes using LabKey:
Once you've familiarised yourself with the tools, you can use this to create a cohort of participants with variants in your gene of interest, and identify phenotypes linked to them.
You can compile together the data you've found in a text editor, but if you prefer a word processor or spreadsheet, we have LibreOffice available:
You may be able to perform some statistical analysis on your data and identify correlations using LO Calc, however, most analysis will require the use of coding on the HPC. Please take a look at the other sections for help with this.
Working with the HPC¶
You will need to work on the HPC for any large-scale analyses. You can learn more about the HPC and how to access it:
There are folders on the HPC for your GECIP domain or Discovery forum. You should use your relevant folder as your working directory. These are also accessible from the desktop:
The Small Variant and Structural Variant pipelines¶
There are prebuilt pipelines to extract all the participants with variants in specified genes, either short variants or larger variants. Both pipelines have examples you can use to test these out.
Find phenotypes associated with participants¶
You can find the phenotypes associated with these participants using LabKey. You will need to make yourself familiar with the clinical data we have available and the LabKey API which you will use to access it.
Once you have run the gene-variant or SV/CNV workflow with your list of genes, you will need to analyse the phenotypes associated with the participants you have identified.
We provide support for coding in Python or R in the RE. You can use interactive coding tools such as RStudio and Jupyter notebooks, which you can use on the HPC:
We have a tutorial on getting medical history for participants which may be useful for finding phenotypes.
You can further analyse the phenotypes you have identified using Python or R, or with LibreOffice Calc.
If you prefer to work with the VCF files directly, you can find out information about our gVCFs and aggregate VCFs:
You can find out more about the file structure where these are located and your own working directories here:
Use tools on the HPC¶
You will find tools like BCFtools installed in the HPC, which you can use for exploring the VCFs.
- High Performance Cluster (HPC)
- Accessing the HPC
- Using software on the HPC
- How to submit jobs to LSF
Filter for consented samples¶
To ensure you are working only with consented samples, you may need to carry out some filtering steps on your VCFs. There are details of how to do this with the aggregated VCFs.
If you are working with the gVCFs, you will need to use LabKey and the current data version to filter.
Phenotypes associated with participants¶
You can also use LabKey to map participants to phenotypes, including HPO terms associated with rare disease, ICD10 codes in medical history and the disease participants were recruited for. We have tutorials on using the LabKey API to build cohorts based on phenotypes and fetching medical history for participants:
- Building cancer cohorts programmatically
- Building rare disease cohorts programmatically
- Accessing medical history data programmatically
Create or import pipelines¶
You can analyse and combine these data in any way you choose, using any programming languages that are provided on the HPC. We also provide conda environments for working in Python and R libraries.
If you have your own pipelines written as containers, you can use Singularity to bring them into the RE.
Compile text and figures¶
You can use LO Calc to create figures and tables. You can also write any notes in LO Writer.
The only way to get the results of your analysis out is using Airlock. You should include any notes you may have made by hand. It is your responsibility to ensure your data conforms to the Airlock rules and does not contain any identifying data.