Skip to content

Filesystem

The filesystem in the RE contains folders that you can access from both the HPC and the desktop. These include your working directory, the genomic data and other data resources. You should use the mounted ~/re_gecip or ~/discovery_forum folders as your working directory.

Home directory structure

The Home directory can be accessed from the desktop

Your working directory

The desktop home directory contains various folders, such as documents and downloads. You should not use these are your working folders as they are not accessible on the HPC. Your working directory should be either your ~/re_gecip (academia) or ~/discovery_forum (industry) folder. You can access these from either the HPC or the desktop. They have significant amounts of storage, allowing you to store your working data, scripts and results.

There is a 10 GB limit on your home directory on the AWS desktop environment and on the login node the HPC. You should avoid using these because if you fill them up you may be unable to login to the RE. You also cannot access the desktop home directory from the HPC or the login node from the desktop.

In your home directory there are links to a number of important folders. These folders are:

Folder Icon Read Write Desktop Path Mounted on HPC HPC Path Description Access
genomes ~/genomes /genomes All the genomic data provided by our sequence partner Illumina . read-only
gel_ data_ resources ~/gel_data_resources /gel_data_resources Outputs from the Genomics England internal pipelines read-only
pgen_public_ data_ resources ~/pgen_public_data_resources /pgen_public_data_resources Public data resources such as 1,000 genomes data, reference genomes, example scripts etc. read-only
specific shared folder ~/<group_name> /<group_name> Backed-up working space for each group (e.g. re_gecip). There are several petabytes of storage space for use and collaboration. read-write

All of these folders are also mounted on the HPC at root ( / ) so you can access them when running programs on the HPC. Your home folder in the Research Environment desktop is NOT available on the HPC. Please use the specific group share instead.

If you attempt to write anything to genomes, gel_data_resources or pgen_public_data_resources you will get a 'permission denied' error. Please note that this will happen if an attempt is made to gunzip a file with no output directory specified. Consider using the following command instead: gunzip -c file_name > /path/to/output.file

Using the /genomes/ folder

Genomes are stored in the /genomes/by_date/ folder. Any given /genomes/by_date/<YYY-MM-DD> directory can hold several hundred deliveries of data.

To find the relevant genomic files for your project, you should use our LabKey tables or Participant Explorer.

Do not traverse through the /genomes/ directory to locate inputs for your studies, as you risk finding genomes of participants who have since withdrawn their consent. Any request to export these data via Airlock will be rejected. You should always consult LabKey to retrieve the latest list of consented genomes.

Research Network shared working space

If you are a member of Research Network, you will be able to read and write to the re_gecip folder. Use this folder as your working space. Within this folder, are sub-folders categorised by GECIP domain (e.g. neurology, cardiovascular, skin, etc. You will be able to see all of these sub-folders, however you will only have read-write access to the sub-folders that you are a member of.

We are in the process of updating the re_gecip folders to match the new Research Network structure.

The re_gecip folder is mounted on the HPC, so any files and folders you save here, will be accessible from the HPC. We recommend saving all your work to your domain folder within the re_gecip folder as you have much more storage allocation. How you organise the domains shared working space is entirely up to you!

Discovery forum shared working space

Each industry Research Network company will have their own specific shared folder in discovery_form which should be used as the shared working space. This folder has several petabytes of storage available and is mounted on the HPC at root. The folder has restricted access to each particular member.

Temporary files

Please be aware that some tools within the Research Environment will require the production of transient or temporary files. The configuration of the HPC means that the /tmp location on the cluster can rapidly become unavailable and severely impact other users of the resource. You should create a directory within the /re_scratch/ location for your temporary files, we have generated the re_gecip and discovery_forum parent directories, you should find your GECIP or discovery forum location within these and create your own directory at the end of the path. The resulting path for your TMPDIR would be:

/re_scratch/re_gecip/<your_GECIP>/<your_username>

and set the location for this temporary file directory in your .bashrc or as an environment variable within your script:

export TMPDIR=/re_scratch/re_gecip/<your_GECIP>/<your_username>

We recommend that you set this in your .bashrc so that the environment variable is generally accessible to your profile. Using a private scratch location will ensure that your files temporary files will remain both accessible and private.

As the scratch location is designed to be used for the temporary storage of transient and intermediary files needed by analyses, we are not able to guarantee that these files will be covered by the Research Environment's backup processes or would be recoverable beyond one month. We strongly advise that the location be reviewed prior to launching new analyses to ensure that any files that are no longer required are cleared from the location.