I'm interested in a gene and I want to know what phenotypes are related¶

How to use this page

Below you can switch between three categories: no code, existing tools and from scratch. Please select the version that matches your skills and the scale of the task you want to do.

Category	Scale	Skills needed	Overview	Audience
no code	small	basic IT skills	Uses no code tools in the RE	Clinicians and biologists without coding or command line skills
existing tools	large	command line and limited coding	uses pipelines generated in-house to carry out standardised analyses	bioinformaticians/computational biologists doing standard analyses
from scratch	large	command line, coding and common bioinformatics tools	illustrates the steps you might follow using common bioinformatics tools to carry out custom analyses	bioinformaticians/computational biologists doing custom analyses

The instructions in each section include links to the relevant pages in the documentation. Links are tagged as:

Tutorials
Tools - descriptive
Data - descriptive
Pre-made workflows
Reference lists/tables

no codeexisting toolsfrom scratch

Find participants¶

The IVA variant browser allows you to search for variants by various filters, including gene and region. Look at:

Find phenotypes associated with participants¶

You can search for participants by ID using Participant Explorer, and find phenotypes associated with them. Have a look at:

You can also explore participants and their phenotypes using LabKey:

LabKey

Once you've familiarised yourself with the tools, you can use this to create a cohort of participants with variants in your gene of interest, and identify phenotypes linked to them.

You can compile together the data you've found in a text editor, but if you prefer a word processor or spreadsheet, we have LibreOffice available:

LibreOffice

You may be able to perform some statistical analysis on your data and identify correlations using LO Calc, however, most analysis will require the use of coding on the HPC. Please take a look at the other sections for help with this.

Working with the HPC¶

You will need to work on the HPC for any large-scale analyses. You can learn more about the HPC and how to access it:

There are folders on the HPC for your GECIP domain or Discovery forum. You should use your relevant folder as your working directory. These are also accessible from the desktop:

Home directory contents

The Small Variant and Structural Variant pipelines¶

There are prebuilt pipelines to extract all the participants with variants in specified genes, either short variants or larger variants. Both pipelines have examples you can use to test these out.

Find phenotypes associated with participants¶

You can find the phenotypes associated with these participants using LabKey. You will need to make yourself familiar with the clinical data we have available and the LabKey API which you will use to access it.

Once you have run the gene-variant or SV/CNV workflow with your list of genes, you will need to analyse the phenotypes associated with the participants you have identified.

We provide support for coding in Python or R in the RE. You can use interactive coding tools such as RStudio and Jupyter notebooks, which you can use on the HPC:

We have a tutorial on getting medical history for participants which may be useful for finding phenotypes.

Accessing medical history data programmatically

You can further analyse the phenotypes you have identified using Python or R, or with LibreOffice Calc.

Find VCFs¶

If you prefer to work with the VCF files directly, you can find out information about our gVCFs and aggregate VCFs:

You can find out more about the file structure where these are located and your own working directories here:

Home directory contents

Use tools on the HPC¶

You will find tools like BCFtools installed in the HPC, which you can use for exploring the VCFs.

Filter for consented samples¶

To ensure you are working only with consented samples, you may need to carry out some filtering steps on your VCFs. There are details of how to do this with the aggregated VCFs.

AggV2 code book

If you are working with the gVCFs, you will need to use LabKey and the current data version to filter.

Phenotypes associated with participants¶

You can also use LabKey to map participants to phenotypes, including HPO terms associated with rare disease, ICD10 codes in medical history and the disease participants were recruited for. We have tutorials on using the LabKey API to build cohorts based on phenotypes and fetching medical history for participants:

Create or import pipelines¶

You can analyse and combine these data in any way you choose, using any programming languages that are provided on the HPC. We also provide conda environments for working in Python and R libraries.

If you have your own pipelines written as containers, you can use Singularity to bring them into the RE.

Using containers within the Research Environment

Compile text and figures¶

You can use LO Calc to create figures and tables. You can also write any notes in LO Writer.

LibreOffice

Export¶

The only way to get the results of your analysis out is using Airlock. You should include any notes you may have made by hand. It is your responsibility to ensure your data conforms to the Airlock rules and does not contain any identifying data.