Skip to content

The HPC is changing

We will soon be switching to a new High Performance Cluster, called Double Helix. This will mean that some of the commands you use to connect to the HPC and call modules will change. We will inform you by email when you are switching over, allowing you to make the necessary changes to your scripts. Please check our HPC changeover notes for more details on what will change.

Jupyter Lab on the HPC

Jupyter Lab is available within the HPC. This allows you to perform interactive script development and data analysis within an HPC compute node.

Jupyter Lab has been installed with Genomics England's HPC under the 2022_base Anaconda3 environment. To access this you will need to log into the HPC. For a complete listing of the packages available within the environment activate the environment and generate a listing with the commands:

source /resources/conda/miniconda3/bin/activate
conda activate 2022_base
conda list

It is important that Jupyter Lab sessions are launched on a compute node, within an interactive session. If your session is launched from the login node you run the risk of disrupting work being performed by other researchers. Login nodes can be identified as having the following nomenclature phpgrifzlogn00N whereas compute nodes will use the phpgridzlsfeNUM naming convention.

Summary

To connect to a Jupyter Session on the HPC:

  1. Open Terminal and connect to the HPC
  2. Navigate to your working folder
  3. Start interactive BASH session on an HPC compute node and activate the anaconda environment
  4. Launch Jupyter Lab "headless" session
  5. Open new terminal in the Research Environment
  6. Create SSH tunnel
  7. connect to Jupyter session from within the Research Environment in Firefox

You can do this manually, or if you use Jupyter regularly, you may prefer to set up a bash function.

Manual process

Items that you will need to keep track of during this process

Name Example Notes
NUM http://phpgridzlsfeNUM.cluster There are a multiple HPC compute nodes that your session may be sent to, some examples are phpgridzlsfe011, phpgridzlsfe012 or phpgridzlsfe031.
You will need to take note of the 3 digit NUM when establishing a connection from the Research Environment to the HPC
PROJECT_CODE re_gecip_cardiovascular for the Cardiovascular GECIP
re_df_illumina for Illumina
Your project code is needed to submit any jobs to the HPC.
For a full list of the project codes please review the table on the following page of the user guide
REMOTE_PORT 8998
11011
This will be the HPC port the service will be running on, the port number is user-defined. Generally using a port that differs from most ports detailed in documentation (i.e.: 8888, 5000, 8000 or 9000) will lower the risk of interference with other services running on the HPC.
HOST_PORT 8998
11011
This will be the Research Environment port that you will be using to connect to the running HPC service on FireFox:
https://localhost:REMOTE_PORT/lab?token=TOKEN or https://127.0.01:REMOTE_PORT/lab?token=TOKEN
While the HOST_PORT and REMOTE_PORT can be set to different numbers we generally recommend that they use the same number in order to simplify the access
TOKEN http://phpgridzlsfeNUM.cluster:REMOTE_PORT/lab?token=TOKEN The TOKEN will be the authentication key needed to establish the connection to the Jupyter Lab session and is included in the URL generated by the Jupyter Lab server
CONNECTION_URL http://127.0.0.1:HOST_PORT/lab?token=TOKEN
or
http://localhost:HOST_PORT/lab?token=TOKEN
This will be the URL that you will need to enter into FireFox to connect to your session. The two URLs are equivalent, you will be able to simply copy the http://127.0.0.1:HOST_PORT/lab?token=TOKEN path from the HPC terminal output or you can simply copy the token, connect to localhost:HOST_PORT and enter the TOKEN in the password request box.

Creating a Jupyter Lab session

First open a terminal and connect to the HPC. Navigate to your working folder.

Create an interactive session:

bsub -P <PROJECT_CODE> -M 25G -Is -q inter bash

When this is ready you will see:

Job <job_id> is submitted to queue <inter>.
<<Waiting for dispatch ...>>
<<Starting on phpgridzlsfeNUM.cluster>>
user@corp.gel.ac@phpgridzlsfeNUM ~]$

Take note of the three-digit NUM. You will need this later.

Now activate the anaconda environment and launch your Jupyter Lab session:

source /resources/conda/miniconda3/bin/activate
conda activate 2022_base
jupyter lab --no-browser --ip="*" --port=REMOTE_PORT

By default Jupyter Lab will run on port 8888, which is the same default port used by a number of other tools such as Jupyter Notebooks and RStudio. You should choose your own port number to ensure access to other tools.

Once the Jupyter Lab session has been launched you will see:

[I 2022-02-17 16:24:18.264 ServerApp] jupyterlab | extension was successfully linked.
[I 2022-02-17 16:24:18.737 ServerApp] nbclassic | extension was successfully linked.
[W 2022-02-17 16:24:18.764 ServerApp] WARNING: The Jupyter server is listening on all IP addresses and not using encryption. This is not recommended.
[I 2022-02-17 16:24:18.776 ServerApp] nbclassic | extension was successfully loaded.
[I 2022-02-17 16:24:18.777 LabApp] JupyterLab extension loaded from /resources/conda/miniconda3/envs/2021_base_clone/lib/python3.7/site-packages/jupyterlab
[I 2022-02-17 16:24:18.777 LabApp] JupyterLab application directory is /nas/weka.gel.zone/resources/conda.actual/miniconda3/envs/2021_base_clone/share/jupyter/lab
[I 2022-02-17 16:24:18.780 ServerApp] jupyterlab | extension was successfully loaded.
[I 2022-02-17 16:24:18.781 ServerApp] Serving notebooks from local directory: /nas/weka.gel.zone/home/USERNAME
[I 2022-02-17 16:24:18.781 ServerApp] Jupyter Server 1.13.5 is running at:
[I 2022-02-17 16:24:18.781 ServerApp] http://phpgridzlsfeNUM.cluster:REMOTE_PORT/lab?token=TOKEN
[I 2022-02-17 16:24:18.781 ServerApp]  or http://127.0.0.1:REMOTE_PORT/lab?token=TOKEN
[I 2022-02-17 16:24:18.781 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2022-02-17 16:24:18.787 ServerApp]

    To access the server, open this file in a browser:
        file:///nas/weka.gel.zone/home/USERNAME/.local/share/jupyter/runtime/jpserver-186812-open.html
    Or copy and paste one of these URLs:
        http://phpgridzlsfeNUM.cluster:REMOTE_PORT/lab?token=TOKEN
     or http://127.0.0.1:REMOTE_PORT/lab?token=TOKEN

Connecting to your HPC session within the Research Environment

You will need to create a tunnel session to the compute node. Open a new terminal, keeping the other one open.

Establish the SSH tunnel:

ssh -4 -L HOST_PORT:phpgridzlsfeNUM.cluster:REMOTE_PORT USERNAME@corp.gel.ac@phpgridzlogn00N.int.corp.gel.ac

In the above command USERNAME@corp.gel.ac@phpgridzlogn00N.int.corp.gel.ac will be your usual HPC login.

And enter your password

Now launch Firefox. Go back to the first terminal you had open, copy the URL and paste it into Firefox. The URL will look like http://127.0.0.1:REMOTE_PORT/lab?token=TOKEN

Now you can work with Jupyter.

Set up a bash process

If you use Jupyter regularly, you can put most of these commands within bash functions.

## User specific functions
function interactiveq(){
  # Launch interactive cluster session
  bsub -P bio -M 25G -Is -q inter bash
}

function jpt(){
   # Launch conda environment
  source /resources/conda/miniconda3/bin/activate
  conda activate 2022_base

  # Launch a headless jupyter notebook session. Requires port forwarding on the host
  jupyter-lab --no-browser --ip="*" --port=$1
}

Where $1 in the jpt function will be the port number that you want to launch the service on, for example:

jpt 11011

While in the Research Environment you should have:

1
2
3
4
5
6
7
8
## User specific functions
function jpttnl(){
  # Establish connection HPC compute node and forward Jupyter session ports
    # -i: use the specified ssh key
    # -4: force use of IPv4 to prevent attempts to use IPv6 and prevent bind errors
    # -L: localhost, specify the host and remote options
  ssh -i ~/.ssh/cluster <username>@corp.gel.ac@phpgridzlogn00N.int.corp.gel.ac -4 -L $1:phpgridzlsfe$2:$1
}

The above command requires that you have created an SSH Key to simplify your HPC access as recommended here.

Here $1 will be the same port numbers on the host as the remote, for simplicity. $2 will be the compute node number, for example:

jpttnl 11011 011