Home directory contents¶
Your Home directory contains you working folders, the genomic data and other data resources. It accessed by clicking the 'Home' application on the desktop. You can also access your home directly through the 'terminal' application.
Home directory structure¶
The Home directory can be accessed via the file system.
Your working directory should be either your ~/re_gecip
(academia) or ~/discovery_forum
(industry) folder. These are accessible from either the HPC or the desktop. They have significant amounts of storage, allowing you to store your working data, scripts and results.
There is a 10 GB limit on your home directories on both the AWS desktop environment and the HPC. You should avoid using these because if you fill them up you may be unable to login to the RE.
In your home directory there are links to a number of important folders. These folders are:
Folder | Icon | Read | Write | Desktop Path | Mounted on HPC | HPC Path | Description | Access |
---|---|---|---|---|---|---|---|---|
genomes | ![]() |
~/genomes |
/genomes |
All the genomic data provided by our sequence partner Illumina . | read-only | |||
gel_ data_ resources | ![]() |
~/gel_data_resources |
/gel_data_resources |
Outputs from the Genomics England internal pipelines | read-only | |||
pgen_public_ data_ resources | ![]() |
~/pgen_public_data_resources |
/pgen_public_data_resources |
Public data resources such as 1,000 genomes data, reference genomes, example scripts etc. | read-only | |||
specific shared folder | ![]() |
~/<group_name> |
/<group_name> |
Backed-up working space for each group (e.g. re_gecip ). There are several petabytes of storage space for use and collaboration. |
read-write |
All of these folders are also mounted on the HPC at root ( / ) so you can access them when running programs on the HPC. Your home folder in the Research Environment desktop is NOT available on the HPC. Please use the specific group share instead.
If you attempt to write anything to genomes
, gel_data_resources
or pgen_public_data_resources
you will get a 'permission denied' error. Please note that this will happen if an attempt is made to gunzip a file with no output directory specified. Consider using the following command instead: gunzip -c file_name > /path/to/output.file
Using the /genomes/
folder¶
Genomes are stored in the /genomes/by_date/
folder. Any given /genomes/by_date/<YYY-MM-DD>
directory can hold several hundred deliveries of data.
To find the relevant genomic files for your project, you should use our LabKey tables or Participant Explorer.
Do not traverse through the /genomes/
directory to locate inputs for your studies, as you risk finding genomes of participants who have since withdrawn their consent. Any request to export these data via Airlock will be rejected. You should always consult LabKey to retrieve the latest list of consented genomes.
Research Network shared working space¶
If you are a member of Research Network, you will be able to read and write to the re_gecip
folder. Use this folder as your working space. Within this folder, are sub-folders categorised by GECIP domain (e.g. neurology, cardiovascular, skin, etc. You will be able to see all of these sub-folders, however you will only have read-write access to the sub-folders that you are a member of.
We are in the process of updating the re_gecip
folders to match the new Research Network structure.
The re_gecip
folder is mounted on the HPC, so any files and folders you save here, will be accessible from the HPC. We recommend saving all your work to your domain folder within the re_gecip
folder as you have much more storage allocation. How you organise the domains shared working space is entirely up to you!
Discovery forum shared working space¶
Each Discovery Forum group will have their own specific shared folder which should be used as the shared working space. This folder has several petabytes of storage available and is mounted on the HPC at root. The folder has restricted access to each particular Discovery Forum member.
Temporary files¶
Please be aware that some tools within the Research Environment will require the production of transient or temporary files. The configuration of the HPC means that the /tmp
location on the cluster can rapidly become unavailable and severely impact other users of the resource. You should create a directory within the /re_scratch/
location for your temporary files, we have generated the re_gecip
and discovery_forum
parent directories, you should find your GECIP or discovery forum location within these and create your own directory at the end of the path. The resulting path for your TMPDIR would be:
/re_scratch/re_gecip/<your_GECIP>/<your_username>
and set the location for this temporary file directory in your .bashrc
or as an environment variable within your script:
export TMPDIR=/re_scratch/re_gecip/<your_GECIP>/<your_username>
We recommend that you set this in your .bashrc
so that the environment variable is generally accessible to your profile. Using a private scratch location will ensure that your files temporary files will remain both accessible and private.
As the scratch location is designed to be used for the temporary storage of transient and intermediary files needed by analyses, we are not able to guarantee that these files will be covered by the Research Environment's backup processes or would be recoverable beyond one month. We strongly advise that the location be reviewed prior to launching new analyses to ensure that any files that are no longer required are cleared from the location.