Skip to content

The HPC is changing

We will soon be switching to a new High Performance Cluster, called Double Helix. This will mean that some of the commands you use to connect to the HPC and call modules will change. We will inform you by email when you are switching over, allowing you to make the necessary changes to your scripts. Please check our HPC changeover notes for more details on what will change.

Python packages and personal conda environments

You can use conda in the RE to access pre-made Python environments as well as creating your own. To install packages to your environments, you will need to use proxy paths.

Licensing considerations

Please note if you choose a self-install route you will be solely and fully responsible for acquiring any licences required for the use of and access to the relevant software package. GEL expect all software to be correctly licensed by the researcher where the self-installation route is employed. In no event shall GEL be liable to you or any third parties for any claim, damages or other liability, whether such liability arises in contract, tort (including negligence), breach of statutory duty, misrepresentation, restitution and on an indemnity basis or otherwise, arising from, out of or in connection with software self-installed by the researcher or the use or other dealings by the researcher in the software.

Any links to third party software available on this User Guide are provided “as is” without warranty of any kind, either expressed or implied, and such software is to be used at your own risk. No advice or information, whether oral or written, obtained by you from us or from this User Guide shall create any warranty in relation to the software.

Using pre-made conda environments

The attached file is a non-exhaustive list of Python modules and packages in each conda environment. Last updated 29 May 2020.

The preferred channel is intel where possible, followed by bioconda. Ensure that your required modules appear within the same environment wherever possible.

pip/pypi is known to have broken pandas in the ipy3pypi and ipy3pypirev1 environments. The error you will see is ImportError: cannot import name 'maybe_downcast_numerical'. You should try py3pypi and ipy3nopypirev1 as alternatives.

Searching conda environments

You can query the conda packages available using a resource available at from:

  • locally: ~/gel_data_resources/software_catalogues/conda_catalogue/
  • HPC: /gel_data_resources/software_catalogues/conda_catalogue/

Take a look at the README.md in that folder for instructions. You will need to use different scripts depending on if you are on the HPC or not:

  • locally: VDI_query_catalogue.sh
  • HPC: /HPC_query_catalogue.sh

Create your own conda environments

You can also create and manage your own conda environments. For security reasons, we have restricted the channels you can access to:

  • main (anaconda)
  • conda-forge
  • bioconda
  • r

We will not be expanding this list.

You will need to route package requests via a proxy mirror for conda requests and PyPi pip installation requests. Worked examples are shown below.

Required file

You can copy the configuration file from /gel_data_resources/example_config_files/Helix/.condarc using the command to your $HOME location:

cp /gel_data_resources/example_config_files/Helix/.condarc ~/.

This must be in your $HOME location on the HPC in order for conda to be able to route your requests correctly.

Process

We would initially recommend creating your base environment and building on top of this with a series of installs.

If you wish to use a YAML definition file for an environment that you have created externally please bear in mind the changes that you will need to make to the channels so that the process can complete.

As you will not have write access to the default install location within the HPC, you will need to ensure that you use the --prefix flag to specify the install location.

Please refer to the conda documentation for guided steps.

conda

For conda to be able to use the channel aliases you will need a copy of the .condarc file mentioned above.

Once you have this file you will be able to create your environment with the command:

conda create python==<version_number> --prefix /path/to/env/location

Once created you will need to use the env path in order to activate it:

source /resources/conda/miniconda3/bin/activate
conda activate /path/to/env/location

To install additional packages you will need to ensure that you have the correct aliases for each channel:

conda install -c conda-virtual <package_1> -c conda-main <package_2> -c conda-conda-forge <package_3> -c conda-bioconda <pacakge_4> -c (conda-r <package_5>)

These commands should work in the same way as you would expect when run outside of the Research Environment.

Creating conda environment from a YAML definition file

It is possible to create and bulk-install packages in a few simple steps. This can be useful to quickly recreate a working conda environment from outside of the Research Environment or to use packages and versions that are the same as a collaborator's. Due to the way access to the package repostories is mediated there are some minor changes that need to be made to the process you would follow outside of the Research Environment.

You can export the definition file with the command:

conda env export > <environment_name>.yml

You will then need to edit the file to: - remove the build hashes from the packages, as mentioned above we have only configured canonical channels and these may not have the same hashes - ensure that the channel names are the same as listed in the .condarc: - conda-main for anaconda - conda-conda-forge for conda-forge - conda-bioconda for bioconda - conda-r for r - move any PyPI packages to a requirements.txt file

once these changes have been made, you will be able to create the environment with the command:

conda env create -f <env_name>.yml -p /path/to/personal/conda_envs/env_name

Once the conda environment has been created and activated you will be able to install the PyPI packages with the command:

pip install -r requirements.txt --index-url https://artifactory.aws.gel.ac/artifactory/api/pypi/pypi/simple

pip

Once you have an active conda environment you will have the option to install packages hosted by PyPI. To ensure that pip is able to access the correct proxy path you will need to use the --index-url flag with the path to the proxy:

pip install <package_name> --index-url https://artifactory.aws.gel.ac/artifactory/api/pypi/pypi/simple