What is an HPC?¶
A High Performance Cluster (HPC) is a way to carry out large-scale analysis, using centralised compute.
flowchart TD
A(Researcher submits job) --> B[Master host and candidates]
B --> C[Queues]
C --> |jobs wait in queues until the required resources are ready| D[Resources]
D <-.-> |Master host and resources are in frequent communication| B
D --> E(Job runs and finishes)
classDef researcher fill:#DF007D,stroke:#DF007D,color:#FFFFFF;
class A,E researcher;
classDef RE fill:#FFC6E6,stroke:#FFC6E6,color:#2B2F3B;
class B,C,D RE;
Overview of usage¶
To use the HPC, you start by logging onto the cluster. This brings you to the login node. From here you can cd
into your working folder. It is possible to load and run software from the login node, but to make use of the full compute, you should launch jobs.
%%{init: {"flowchart": {"htmlLabels": false, 'curve': 'linear'}} }%%
flowchart TB
subgraph "`RE`"
direction TB
B["Research Environment"] --> C["Terminal"]
end
subgraph "`HPC`"
direction TB
D["`**Login node** low resourced`"] -- "`create job`" --> E["`**Worker node** high resourced`"]
end
subgraph "`Weka`"
direction LR
F[Weka storage] --> G["`**discovery_forum:** read/write folder for Industry Research Network members`"]
F --> H["`**re_gecip:** read/write folder for Academic Research Network members`"]
F --> I["`**genomes:** read only, contains all consented genomes`"]
F --> J["`**public_data_resources:** read only, contains public resources, eg gnomAD`"]
F --> K["`**gel_data_resources:** read only, contains GEL-generated datasets, eg AggV2`"]
end
A("`Researcher`") --> RE
C -- "`ssh`" --> HPC
HPC --> Weka
RE --> Weka
classDef node fill:#FFC6E6,stroke:#FFC6E6,color:#2B2F3B;
class A,B,C,D,E,F,G,H,I,J,K node;
Terminology¶
Term | Meaning |
---|---|
LSF | Load Sharing Facility - the tool we use to schedule jobs on the HPC |
CPU | Central Processing Unit, the main processors |
Nodes | Ephemeral storage, networking, memory and processing resources that can be consumed by virtual machine instances. Sometimes referred to as hosts . |
Job | A task that you run on the HPC. Jobs can spawn other jobs. |
Queue | When you submit a job, it joins a queue. You can choose which queue to join, depending on the length of the job. |
Batch jobs | A job that you set off, then it runs independently in the background |
Interactive jobs | A job that opens access to the HPC, allowing you to run commands and tools on the cluster |
Running | A job that is in progress |
Pending | A job that is waiting in the queue |
working directory | The folder where you put all your files. |
standard output | Information about the job as it runs. If you were running a job normally, this would appear in the terminal, however on an HPC, you should set a file to write this to. |
standard error | Information about errors from the job. If you were running a job normally, this would appear in the terminal, however on an HPC, you should set a file to write this to. |
scratch | A location to write any temporary files creating during the job. |
project code | researchers are grouped based on Research Network membership, with compute resources shared between the groups |
modules | Software that has been loaded onto the HPC, which you can use in your analysis |
Usage guidelines¶
DO
DO launch interactive jobs to run software on the HPC.
DO kill interactive jobs when you've finished with them.
DO choose the appropriate length queue for your job.
DO estimate the memory required for your job.
DO use scripts to launch your batch jobs.
DO set LSF parameters (#BSUB
) within your scripts for improved traceability in your batch jobs.
DO specify the location for your standard output and error to help with troubleshooting
DO use scratch directories for your temporary files.
DO use containers to import software.
DO work with interactive coding tools such as Rstudio and Jupyter on the HPC.
DO set up .netrc
to use the LabKey API.
DON'T
DON'T run software on the login node.
DON'T request more memory than you need.
DON'T keep your temporary files in folders that will be backed up.