Skip to content

The HPC is changing

We will soon be switching to a new High Performance Cluster, called Double Helix. This will mean that some of the commands you use to connect to the HPC and call modules will change. We will inform you by email when you are switching over, allowing you to make the necessary changes to your scripts. Please check our HPC changeover notes for more details on what will change.

About the HPC

Genomics England provides a High Performance Computing Cluster (HPC) called Helix. You should use this to run all production-worthy scripts and workflows.

About high performance computing

High performance computing (HPC) is the application of collection of computers to crunch large values of data. HPC also refers to the cluster of computers itself. An HPC is collocation of computer cores organised in nodes.

HPCs are usually accessed from the command line via ssh. When you do so, you access a login node. These nodes have limited computer power, but they serve as a platform for scripting and job submission.

A job is a task you want to perform that will be placed in a queue and run accordingly. For large jobs, i.e. jobs that deal with lots of data, it is important to specify the number of cores and memory needed to run your job. You can read more on how to do that, as well as how to run an array of jobs, which are jobs that are identical except for their input, in our HPC guidelines.

Genomics England Helix setup

For most of the nodes, Helix uses the IBM Load Sharing Facility (LSF) platform to schedule jobs, by organising each job requirement and managing the job queue in an efficient and fair way.

Each node has a fixed number of 'job slots’. A job typically consumes one slot (though this can be more for parallel jobs), and the standard policy is one job slot per core. Consequently, the cluster has a maximum concurrent job limit to allow a fast dispatch. In our infrastructure, this means each compute node can accommodate 34 concurrent jobs (batch jobs).

For interactive jobs, we allow more job slots per node with the assumption that interactive workloads are not resource intensive. Interactive sessions should be used mostly as a way to submit jobs to the cluster, rather than for running day-to-day activities directly.

Helix specifications

This is our main production grid. All workloads are expected to be submitted to this grid targeting the right queue.

Cluster name Number of physical compute nodes Operating systems Queueing system and schedulers Number of CPU cores Total number of job slots Available queues
helix 54 CentOS 7.6.1810 LSF 34 cores/node = 1,836 1,836 inter
short
medium
long
1 GPU (2x v100) node CentOS 7.6.1810 LSF
Accessed using the reservation ID gpu1 via LSF ("brsvs" shows current reservations, you need to be added to a reservation to access)
inter
short
medium
long

View cluster information

To view cluster information (LSF version, Cluster name, Master host) and check if your environment is correctly setup, run command lsid.

lsid

IBM Spectrum LSF Standard 10.1.0.0, Jul 08 2016

Copyright International Business Machines Corp. 1992, 2016.

US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

My cluster name is cluster

My master name is phpgridzlsfm001.cluster

Resources in LSF

LSF tracks resource availability and usage. LSF Jobs can use defined resources to request specific resource.

All hosts have static numeric resources. e.g

maxmem total physical memory
ncpus number of CPUs
maxtmp maximum available space in /tmp
cpuf CPU factor (relative performance)

as well as all hosts have dynamic numeric resources. e.g

mem available memory
tmp available space in /tmp
ut CPU utilisation

Additionally resources can be, OS and ARCH boolean resources per host. This allows easy targeting of correct platforms. Example generic and specific OS resources

ub1604 host is running Ubuntu 16.04.

dsk host has local disk /scratch with 2TB space

Ways to specify resources strings requirement (-R option)

Select: It is a logical expression built from a set of resource names
Order: The order string is used for host sorting and selection
Usage: It is used to specify resource reservations for jobs
Span: A span string specifies the locality of a parallel job.
Same: The same string specifies that all processes of a parallel job must run on hosts with the same resource

History

Helix was released in 2020, replacing its predecessor Pegasus.