Skip to content

Step-by-step guide to using containers

To use containers on the Genomics England HPC, follow the steps below:

  1. Identify the container you want to import
  2. Launch a job on the HPC
  3. Load Singularity
  4. Pull the container
  5. Mount files to the container
  6. Run the container

1. Identify the container you want to import

On the HPC, you can pull in containers stored in dockerhub or quay.io, either publicly available containers, or build your own and deposit them there.

If you want to find publicly available containers, you may find a container listed with the documentation for the software you're using, or you may find what you need on a repository like biocontainers. Check for a dockerhub or quay.io container location on these websites.

The instructions for pulling a container from the original documentation (for example biocontainers) will not work in the RE. Please follow the instructions under Pull the container for how to pull.

2. Launch a job on the HPC

You will need to launch a job on the HPC to use containers, and ensure you're using the full compute power of the HPC. For experimentation, we suggest using an interactive job. Once you have incorporated your container image into your pipeline, you may prefer to use a batch job.

3. Load Singularity

Singularity 3.8.3 and 4.1.1 are available on the HPC. Load 4.1.1 with

module load singularity/4.1.1

Docker containers

Docker is not available on the RE, due to permissions issues with Docker. You can still use Docker containers with Singularity.

4. Pull the container

To bring the container into the RE, you will need to use a pull command. This comprises:

  • singularity pull
  • The name for the singularity file that will be created. The file is named filename.sif. You can call this whatever you choose. It will be saved to your current working directory.
  • The web location of the container

    For the location of the container, you will need to alter the URL in order to re-route by artifactory.

    If you're using a dockerhub container in the form container_path, you should change it to docker://docker-remote.artifactory.aws.gel.ac/container_path.

    If you're using a quay.io container in the form quay.io/container_path, you should change it to docker://docker-quay-io.artifactory.aws.gel/container_path.

It is possible to use run straight away, without first pulling the container, but no filename.sif file will be saved in your working directory.

Example pull commands

quay.io

Here we will pull deeptools from Biocontainers.

On the deeptools page, we can see a container location on quay.io, listed under "Docker installation":

quay.io/biocontainers/deeptools:3.5.5--pyhdfd78af_0

Even though there is also a command listed as "Singularity installation", we are not going to use this, since it uses a galaxy project repository location.

We will need to reroute this file location via artifactory, this changes it to:

docker://docker-quay-io.artifactory.aws.gel.ac/biocontainers/deeptools:3.5.5--pyhdfd78af_0

We can use this to pull with Singularity:

singularity pull my_deeptools.sif docker://docker-quay-io.artifactory.aws.gel.ac/biocontainers/deeptools:3.5.5--pyhdfd78af_0

dockerhub

Here we will pull bcftools from Dockerhub.

Here the pull location from dockerhub is listed as biocontainers/bcftools.

To reroute via artifactory, we will alter this to: docker://docker-remote.artifactory.aws.gel.ac/biocontainers/bcftools.

Our full pull command is:

singularity pull my_bcftools.sif docker://docker-remote.artifactory.aws.gel.ac/biocontainers/bcftools

5. Mount files to the container

To analyse data with your container, you will need to mount or bind files to the container. This will allow the container to access the data. Do this using the --bind argument within your containers exec or run command.

relative paths

The paths you usually use to access folders in the RE are actually relative paths. When you mount paths to your container, you will need to use the full path. This means that to access the /genomes/ folder, you will need to use /nas/weka.gel.zone/pgen_genomes/.

To mount, you will need to --bind the full path, and then add another --bind for the relative path.

Some example binds

Include the relevant arguments in your run or exec commands. You can also bind specific files and folders within these listed folders:

  • genomes: containing the genomic VCF and alignment files
    --bind /nas/weka.gel.zone/pgen_genomes:/nas/weka.gel.zone/pgen_genomes --bind /genomes:/genomes

  • gel_data_resources: containing the results of Genomics England bioinformatics analysis including AggV2
    --bind /nas/weka.gel.zone/pgen_int_data_resources:/nas/weka.gel.zone/pgen_int_data_resources --bind /gel_data_resources:/gel_data_resources

  • public_data_resources: containing copies of public data
    --bind /nas/weka.gel.zone/pgen_public_data_resources:/nas/weka.gel.zone/pgen_public_data_resources --bind /public_data_resources:/public_data_resources

  • re_scratch: which you should use for all temporary files
    --bind /nas/weka.gel.zone/re_scratch:/nas/weka.gel.zone/re_scratch --bind /re_scratch:/re_scratch

  • re_gecip: the working folder for academic Research Network members
    --bind /nas/weka.gel.zone/re_gecip:/nas/weka.gel.zone/re_gecip --bind /re_gecip:/re_gecip

  • discovery_forum: the working folder for commercial Discovery Forum members
    --bind /nas/weka.gel.zone/discovery_forum:/nas/weka.gel.zone/discovery_forum --bind /discovery_forum:/discovery_forum

6. Run the container

There are two functions for running the software in a container: run and exec.

Many containers are written including an action, so they load the relevant software and then carry out the action. If you want to carry out that action, you should use run. For example:

singularity run \
  --bind /nas/weka.gel.zone/re_gecip:/nas/weka.gel.zone/re_gecip --bind /re_gecip:/re_gecip \
  my_container.sif \
  /nas/weka.gel.zone/re_gecip/my_working_folder/my_file.tsv

If you wish to choose and run a particular function the software in your container, you should use exec. For example:

singularity exec \
  --bind /nas/weka.gel.zone/re_gecip:/nas/weka.gel.zone/re_gecip --bind /re_gecip:/re_gecip \
  my_container.sif \
  function_in_container \
  --file /nas/weka.gel.zone/re_gecip/my_working_folder/my_file.tsv

Video tutorial