Skip to content

The HPC is changing

We will soon be switching to a new High Performance Cluster, called Double Helix. This will mean that some of the commands you use to connect to the HPC and call modules will change. We will inform you by email when you are switching over, allowing you to make the necessary changes to your scripts. Please check our HPC changeover notes for more details on what will change.

Variant Effect Predictor (VEP) container

VEP is a tool provided by Ensembl for functional annotation of variant consequences on genes. We currently provide versions 99, 105, 106, 109 and 111 of VEP as containers. These include the most used plugins.

Description

The Variant Effect Predictor (VEP) is a comprehensive annotation source created by the Ensembl project and relies on their database to provide annotation data for variants. To facilitate the use of this tool and conform to the need to ensure the security of the data that we host we have developed containerised versions of the tool which include the canonical plugins.

The container has been created from a Docker image converted to a Singularity file. The contents listed in the section below

VEP v.111 release

The VEP 111 container includes: * the AlphaMissense plugin has been included * The project base image has been updated to Ubuntu 22.04 * the following plugins have been removed as they user Python 2.x which is no longer supported: * FATHMM * FATHMM_MKL * PON_P2

We have had a change in the way we support the various versions of LOFTEE, this is due to the way the new base image functions. Effectively we need to release two separate containers for Build 37 and Build 38. These containers can be used directly from our internal mirror (Artifactory) with the following commands:

For Build 37: singularity run docker://docker-gel-research-containers.artifactory.aws.gel.ac/vep:v111_lofteeb37 vep

For Build 38: singularity run docker://docker-gel-research-containers.artifactory.aws.gel.ac/vep:v111_lofteeb38 vep

Contents of the VEP containers

Notes

The container included all plugins currently available within the Ensembl VEP project's GitHub page with the addition of LOFTEE. Other third party plugins may be made available on a case-by-case basis, however, due to the inherent limitations within VEP, the annotation of VCFs with these third party tools cannot be combined with the internal plugins in a single step.

It is important to note the following points:

  • The container will overwrite any existing output files, please ensure that any prior work you have performed is backed up before launching a new annotation.
  • The annotation jobs need to be submitted to the HPC either via an interactive job for a single file or submitted as a batch of jobs for a list of files, please consult this page for instructions on how to achieve this.
  • Because of the way filesystems are mounted on the HPC you will need to mount (--bind) the full path to the locations for your input, output and other ancillary files please refer to the examples below to guide you.

Commercial usage

If you are a Discovery Forum member, there are some restrictions on the plugins and VEP options you are allowed to use.

Genomics England imposes no restrictions on access to, or use of, the data provided and the software used to analyse and present it.

Some of the data and software included in the distribution may be subject to third-party constraints. You are solely responsible for establishing the nature of and complying with any such restrictions.

Instructions

Access the container

For versions 99, 105, 106 and 109 copy the container to your working directory:

cp -R /gel_data_resources/example_scripts/annotate_variants_with_vep/ /<PATH_TO_YOUR_WORKING_DIRECTORY>/

That this will copy configuration files for each available version of VEP container, and the container itself, which has a .sif suffix. You will only need to adjust the configuration file for the corresponding version of the VEP container.

For version 110, you should access the container via Artifactory:

singularity run docker://docker-gel-research-containers.artifactory.aws.gel.ac/vep:v110.0.1 vep

Create an input file list

If you are only annotating a single VCF file, then you only need to add this to your submission script. However, if you're annotating multiple VCFs, you need to create a file list.

You should use the latest version of LabKey to compile your list.

Example of an input file

/path/to/target/file/variant_file_001.vcf.gz
/path/to/target/file/variant_file_002.vcf.gz
/path/to/target/file/variant_file_003.vcf.gz
/path/to/target/file/variant_file_004.vcf.gz
/path/to/target/file/variant_file_005.vcf.gz

Save the file to your working directory.

Edit the configuration file

The container can be directly accessed from the directory you copied. To execute it, you will need to bind the relevant file locations. This is stored in a file called vep_###.conf.

A sample file is stored in the folders you have copied, including most of the binds you need. However, you will need to edit some files:

  • INPUT_FILE: The filepath to the input file in your working directory.
  • MOUNT_WD: Your working directory here where your scripts are located. This should be in the format <work_directory>:<work_directory>. For example /nas/weka.gel.zone/re_gecip/<your_folder>:/nas/weka.gel.zone/re_gecip/<your_folder> or /nas/weka.gel.zone/discovery_forum/<your_folder>:/nas/weka.gel.zone/discovery_forum/<your_folder>.

    These must include the /nas/weka.gel.zone/ prefix, and you must duplicate the filepath.

Edit the submission script

Edit the annotation script called vep_sif_annotation.sh. You will need to edit

* `#BSUB -q <queue_name>`: the queue that you want to submit the script to. If it is the first time you are trying it, we recommend the *short* queue (`#BSUB -q short`) as it will run for four hours at most.
* `#BSUB -P <your_project_code>`: your [project code](lsf_codes.md). For example `#BSUB -P re_gecip_neurology`.
* `input =`: the file you want to annotate.
* `output_file` in the VEP options: The full file path for the output.

A sample of VEP options and plugins have been included in the submission script. You can edit the VEP command to add or remove options and plugins as specified in the Ensembl VEP documentation.

Edit the annotation script called batch_annotation.sh.

* `#BSUB -q <queue_name>`: Specify the queue that you want to submit the script to. We recommend using the *medium* (24 hours max) or *long* (one week default but changeable) queues for this script (`#BSUB -q long`).
* `#BSUB -P <your_project_code>`: Specify your [project code](lsf_codes.md). For example `#BSUB -P re_gecip_neurology`.
* `output_file` in the VEP options: The full file path for the output.

A sample of VEP options and plugins have been included in the submission script. You can edit the VEP command to add or remove options and plugins as specified in the Ensembl VEP documentation.

Submit the script

Submit to the HPC using:

bsub < vep_sif_annotation.sh

Submit to the HPC using:

bsub < batch_annotation.sh

Inputs and outputs

File or Folder Description Requires editing
vep_sif_annotation.sh Single VCF file annotation Yes, can be provided by you. Example only.
batch_annotation.sh Batch VCF file annotation Yes, can be provided by you. Example only.
vep.conf Configuration file for VEP variables Yes. Example only.
vep_N.sif VEP Singularity image (as we currently host containers for v99, v105, v106 and v109 the name of the file will need to be carefully selected to ensure that the correct file and cache are matched up. The image file itself does not need to be edited.) No
input.txt List of vcf file paths. These will either be extracted from LabKey as part of your cohort building process or the /path/to/VCF/files if you are using VCF files custom called and aggregated VCFs. The process requires that there be one path per line. Yes