Skip to content

The HPC is changing

We will soon be switching to a new High Performance Cluster, called Double Helix. This will mean that some of the commands you use to connect to the HPC and call modules will change. We will inform you by email when you are switching over, allowing you to make the necessary changes to your scripts. Please check our HPC changeover notes for more details on what will change.

Using VEP with the LOFTEE plugin

Question

How can I annotate my VCF on the HPC using VEP and its LOFTEE plugin?

Answer

Using VEP+LOFTEE on the HPC

The main maintained version of VEP+LOFTEE on the HPC is currently VEP 99.

NB: please always remember to check the "Latest update" panel at the top of this page!

In particular, VEP can be run with LOFTEE using a script that performs the following steps (see Details below if needed):

module purge

module load bio/VEP/99.1-foss-2019a-Perl-5.28.1

export PERL5LIB=/resources/tools/apps/software/bio/VEP/99.1-foss-2019a-Perl-5.28.1/Plugins/loftee-GRCh38:${PERL5LIB}

vep <your_favourite_vep_command_options_and_flags>

For instance, the short example VEP+LOFTEE job mentioned in the Guide page linked above (which takes care of all of those steps) can be submitted and run on the HPC with:

bsub -q short -P <your_project_code> -cwd . -e err -o out bash /gel_data_resources/example_scripts/annotate_variants_with_vep/Helix_annotate_variants_with_vep99_and_LOFTEE.sh

Details

To run VEP with the LOFTEE plugin, these instructions suggest to:

  1. Firstly, remove any other modules you may have loaded - this is currently needed on the HPC when running VEP
  2. Then, simply load the module for VEP 99
  3. Run the "export" line in order to add the relevant version of LOFTEE to the Perl library variable (note: you need to choose GRCh37 or GRCh37 here because, due to how the plugin's code is written, the code for processing GRCh37 or GRCh38 data is found at different locations) - this is required by the LOFTEE plugin:
    1. For GRCh37 data, you'll need to change that line to "export PERL5LIB=/resources/tools/apps/software/bio/VEP/99.1-foss-2019a-Perl-5.28.1/Plugins/loftee-GRCh37:${PERL5LIB}"
    2. For GRCh38 data, that line is ready to be used
  4. Run VEP as usual, following both VEP and LOFTEE documentation, and using the variables that are relevant to your case in the "--plugin" option line.

The example job found at the location specified above (/gel_data_resources/example_scripts/annotate_variants_with_vep/Helix_annotate_variants_with_vep99_and_LOFTEE.sh) makes use of some of the environment variables that are set when loading the module (please note that those variables will exist only after loading the module):

Example job VEP+LOFTEE

module purge
module load bio/VEP/99.1-foss-2019a-Perl-5.28.1

export PERL5LIB=${LOFTEE38}:${PERL5LIB}

vep --input_file input.vcf.gz \
--output_file output.vcf.gz \
--vcf \
--compress_output bgzip \
--species homo_sapiens \
--assembly GRCh38 \
--offline \
--cache \
--dir_cache ${CACHEDIR} \
--cache_version 99 \
--force \
--no_stats \
--everything \
--fasta /public_data_resources/reference/GRCh38/GRCh38Decoy_no_alt.fa \
--plugin CADD,/public_data_resources/CADD/v1.5/GRCh38/whole_genome_SNVs.tsv.gz \
--plugin LoF,loftee_path:${LOFTEE38},human_ancestor_fa:${LOFTEE38HA},gerp_bigwig:${LOFTEE38GERP},conservation_file:${LOFTEE38SQL} \
--custom /public_data_resources/clinvar/20190219/clinvar/vcf_GRCh38/clinvar_20190219.vcf.gz,ClinVar,vcf,exact,0,CLNDN,CLNDNINCL,CLNDISDB,CLNDISDBINCL,CLNHGVS,CLNREVSTAT,CLNSIG,CLNSIGCONF,CLNSIGINCL,CLNVC,CLNVCSO,CLNVI

To see the environment variables that are set automatically when loading the module, simply run:

module show bio/VEP/99.1-foss-2019a-Perl-5.28.1

...and look for the lines containing a setenv() instruction, for instance:

setenv("LOFTEE38","/resources/tools/apps/software/bio/VEP/99.1-foss-2019a-Perl-5.28.1/Plugins/loftee-GRCh38")

VEP and LOFTEE warnings

Finally, note that these categories of warnings appear when running VEP with LOFTEE, and should all be harmless:

  • "Smartmatch is experimental at /tools/apps/vep/plugin/loftee-grch38/loftee/... line ..." - These are due to how the LOFTEE code is written and are harmless.
  • "Use of uninitialized value in split at /tools/apps/vep/plugin/loftee-grch38/loftee/[LoF.pm](http://LoF.pm) line 562" and "Use of uninitialized value $number_of_exons in subtraction - at /tools/apps/vep/plugin/loftee-grch38/loftee/[LoF.pm](http://LoF.pm) line 575" - These are due to the LOFTEE code currently mismatching the VEP code / Ensembl data, and are probably harmless.
  • "WARNING: Chromosome chrUn_KI270302v1 not found in annotation sources or synonyms on line ..." and "Reference KI270713.1:32134-32134 not found in FASTA file" - These are due to differences between the VCF data and the VEP cache data (reference sequence contigs and contig names), and should happen only for small, minor contigs, while not affecting the main chromosomes/contigs.

Using VEP+LOFTEE on Pegasus

The only maintained versions of VEP on Pegasus (i.e. the "old" HPC system) are currently VEP 96 and VEP 98.

NB: please always remember to check the "Latest update" panel at the top of this page!

Load VEP

On Pegasus, use "module load" to load your favourite VEP module, for example:

module load vep/98

After you do that, you can simply run VEP using the vep command plus options as usual, per VEP documentation.

Show VEP variables

In order to see the (growing, as users request more plugins) list of environment variables that are set, and dependencies that are loaded, when loading your favourite VEP module on Pegasus, use "module show" - for example:

module show vep/98

...which outputs the following (note that this will be different for VEP 96 and VEP 98):

/tools/envmodules/modulefiles/vep/98:

module load samtools/1.10
module load perl/5.24
setenv LOFTEE37 /tools/apps/vep/plugin/loftee-master/loftee/
setenv LOFTEE37SQL /tools/apps/vep/plugin/loftee-master/conservation_file/phylocsf_gerp.sql
setenv LOFTEE37HA /tools/apps/vep/plugin/loftee-master/human_ancestor/human_ancestor.fa.gz
setenv LOFTEE37GERP /tools/apps/vep/plugin/loftee-master/GERP/GERP_scores.final.sorted.txt.gz
setenv LOFTEE38 /tools/apps/vep/plugin/loftee-grch38/loftee/
setenv LOFTEE38SQL /tools/apps/vep/plugin/loftee-grch38/conservation_file/loftee.sql
setenv LOFTEE38HA /tools/apps/vep/plugin/loftee-grch38/human_ancestor/human_ancestor.fa.gz
setenv LOFTEE38GERP /tools/apps/vep/plugin/loftee-grch38/GERP/gerp_conservation_scores.homo_sapiens.GRCh38.bw
setenv REVELFILE /public_data_resources/vep_resources/REVEL/new_tabbed_revel.tsv.gz
setenv GSPROGRAM /tools/apps/vep/plugin/GeneSplicer/GeneSplicer/bin/linux/genesplicer
setenv GSDATA /tools/apps/vep/plugin/GeneSplicer/GeneSplicer/human
prepend-path PATH /tools/apps/vep/98/ensembl-vep
prepend-path PERL5LIB /tools/apps/vep/plugin/VEP_plugins
prepend-path PERL5LIB /tools/apps/vep/98/ensembl-vep/.vep/Plugins
prepend-path PERL5LIB /tools/apps/vep/98/ensembl-vep

All of those variables are available after loading the VEP module, and they are meant to make it easier for you to refer to important data used by VEP or by the plugins - see next section for an example.

The rest of this FAQ will focus on using VEP with the LOFTEE plugin, however some information (including the usage of "module help" below) applies to using VEP in general.

Show VEP help (includes LOFTEE)

To get some minimal help for running VEP, including an example that shows the LOFTEE usage, run for example:

module help vep/98

...which outputs the following (again, this will be different for VEP 96 and VEP 98):

``` bash linenumes="1" ----------- Module Specific Help for 'vep/98' ---------------------

Modulefile for VEP 98 More info at: http://www.ensembl.org/info/docs/tools/vep/index.html

See 'module show vep/98' for a list of environment variables set by this module

Example command to run VEP with the LOFTEE plugin on GRCh38 data (see URL above and LOFTEE docs):

export PERL5LIB=$PERL5LIB:${LOFTEE38}

vep \
--input_file ~/input.vcf \
--format vcf \
--offline \
--cache \
--cache_version 98 \
--dir_cache /tools/apps/vep/98/ensembl-vep/.vep \
--species homo_sapiens \
--assembly GRCh38 \
--verbose \
--no_stats \
--fasta /public_data_resources/reference/GRCh38/GRCh38Decoy_no_alt.fa \
--plugin LoF,loftee_path:${LOFTEE38},human_ancestor_fa:${LOFTEE38HA},gerp_bigwig:${LOFTEE38GERP},conservation_file:${LOFTEE38SQL} \
--force_overwrite \
--output_file ~/output.vcf

```

You can see that the example makes use of some of the variables that are set when loading the module (once again, please note that those variables will exist only after loading the module).

In particular, if you want to use the LOFTEE plugin, you can see that the help suggests to:

  1. Firstly, run the "export" line in order to add the relevant version of LOFTEE to the Perl library variable (note: you need to choose GRCh37 or GRCh37 here because, due to how the plugin's code is written, the code for processing GRCh37 or GRCh38 data is found at different locations):
    1. For GRCh37 data, you'll need to change that line to "export PERL5LIB=$PERL5LIB:${LOFTEE37}"
    2. For GRCh38 data, that line is ready to be used
  2. Run VEP as usual, following both VEP and LOFTEE documentation, and using the relevant variables to your case in the "--plugin" option line.

VEP and LOFTEE warnings

Finally, note that these categories of warnings appear when running VEP with LOFTEE, and should all be harmless:

  • "Smartmatch is experimental at /tools/apps/vep/plugin/loftee-grch38/loftee/... line ..." - These are due to how the LOFTEE code is written and are harmless.
  • "Use of uninitialized value in split at /tools/apps/vep/plugin/loftee-grch38/loftee/LoF.pm line 562" and "Use of uninitialized value $number_of_exons in subtraction - at /tools/apps/vep/plugin/loftee-grch38/loftee/LoF.pm line 575" - These are due to the LOFTEE code currently mismatching the VEP code / Ensembl data, and are probably harmless.
  • "WARNING: Chromosome chrUn_KI270302v1 not found in annotation sources or synonyms on line ..." and "Reference KI270713.1:32134-32134 not found in FASTA file" - These are due to differences between the VCF data and the VEP cache data (reference sequence contigs and contig names), and should happen only for small, minor contigs, while not affecting the main chromosomes/contigs.

Last updated

This page was last updated on the 12 Feb 2021.