Using containers within the Research Environment¶
You can work with containers in the RE using Singularity. A limited number of container repositories can be accessed and require the use of a proxy. For security reasons we cannot allow pushing out of the environment.
This page will highlight some best practices to work with containers within the Research Environment.
Licensing considerations
Please note if you choose a self-install route you will be solely and fully responsible for acquiring any licences required for the use of and access to the relevant software package. GEL expect all software to be correctly licensed by the researcher where the self-installation route is employed. In no event shall GEL be liable to you or any third parties for any claim, damages or other liability, whether such liability arises in contract, tort (including negligence), breach of statutory duty, misrepresentation, restitution and on an indemnity basis or otherwise, arising from, out of or in connection with software self-installed by the researcher or the use or other dealings by the researcher in the software.
Any links to third party software available on this User Guide are provided “as is” without warranty of any kind, either expressed or implied, and such software is to be used at your own risk. No advice or information, whether oral or written, obtained by you from us or from this User Guide shall create any warranty in relation to the software.
Loading Singularity on the HPC¶
To use Singularity on the HPC please type the following: module load tools/singularity/3.8.3
Caching (Singularity)¶
Whenever you create an image with Singularity within the HPC, the files are automatically cached. The cached files are located in /home/<username>/.singularity/
. However, it could be that you are submitting and creating an image via a compute node in an interactive session. In that case the caching will output the file there which may potentially flood the compute node's memory. You can redirect this location by setting the environment variable SINGULARITY_CACHEDIR
.
For example, we recommend placing the environment variable in your .bashrc
script as follows SINGULARITY_CACHEDIR="/re_gecip/my_GECIP_/username/singularity_cache/"
.
To view your current cache you can use the command singularity cache list
and singularity cache list --all
to view all the individual blobs that have been pulled.
To clean up your cache you can use the command: singularity cache clean
List of available repositories¶
There are various container repositories available which have been whitelisted for the HPC. To ensure the correct use and security of our system, the default URLs are blocked by our firewall. Instead, these repositories may be accessed by Singularity using URLs that are routed via the artifactory. These artifactory URLs are as follows:
- Docker: docker-remote.artifactory.aws.gel.ac
- Quay.io: docker-quay-io.artifactory.aws.gel.ac
The URLs used inside singularity commands should be updated using the following example:
Example URL adjustment
Please refer to the documentation for the container for details on how to run it.
Example quay.io: bcftools¶
In this example we will use bcftools 1.13, which is available on https://quay.io/repository/biocontainers/bcftools?tab=info.
First load singularity
, pull the container and build a singularity image so you do not need to pull the container every time. Then run the basic command, mount the /gel_data_resources/
folder and run a simple bcftools view
command on a VCF from our aggV2 dataset.
Some containers may be sizeable, so we recommend pulling and/or creating images via an interactive session. The bcftools container of this example is ~234 Mb, but they can easily reach >Gb depending on the software complexity. Please also note the caching section above.
Running bcftools via containers
Mounting drives and environment variables¶
In the above example we use the --bind
argument to mount the /genomes folder to the container. By default containers will not have the same drives mounted to them, so this needs to be added manually. An added complication of our file system is that we generally make use of relative paths. For instance, the actual path of our /genomes/
folder is /nas/weka.gel.zone/pgen_genomes/
. On a day to day basis you will not find any hindrance of this, however for containers it is something to be aware of. In fact, you will first need to --bind
the full path, and then add another --bind
for the relative path. As we can understand that this can be rather frustrating, we provide a list of useful file paths and relative paths for to ensure a path of least resistance.
binds of interest
Below shows an example where we are using two of these variables to save the header of an aggV2 vcf into a .txt file. The example assumes that you also ran the initial bcftools example shown above and therefore have the bcftools singularity image made. Please note that you should change the file path to your own folders, and check whether you need to use the GECIP or Discovery Forum example.
Example combined mounts
Working with containers within a workflow¶
Two ways of going about with this, either pull the container directly within a task of the workflow or create an image beforehand and let the workflow call upon the image. You can also add some of the --bind
examples from above into the SINGULARITY_MOUNTS
variable.