Python packages and personal conda environments¶
You can use conda in the RE to access pre-made Python environments as well as creating your own. To install packages to your environments, you will need to use proxy paths.
Licensing considerations
Please note if you choose a self-install route you will be solely and fully responsible for acquiring any licences required for the use of and access to the relevant software package. GEL expect all software to be correctly licensed by the researcher where the self-installation route is employed. In no event shall GEL be liable to you or any third parties for any claim, damages or other liability, whether such liability arises in contract, tort (including negligence), breach of statutory duty, misrepresentation, restitution and on an indemnity basis or otherwise, arising from, out of or in connection with software self-installed by the researcher or the use or other dealings by the researcher in the software.
Any links to third party software available on this User Guide are provided “as is” without warranty of any kind, either expressed or implied, and such software is to be used at your own risk. No advice or information, whether oral or written, obtained by you from us or from this User Guide shall create any warranty in relation to the software.
Using pre-made conda environments¶
The attached file is a non-exhaustive list of Python modules and packages in each conda environment. Last updated 29 May 2020.
The preferred channel is intel
where possible, followed by bioconda
. Ensure that your required modules appear within the same environment wherever possible.
pip/pypi is known to have broken pandas in the ipy3pypi
and ipy3pypirev1
environments. The error you will see is ImportError: cannot import name 'maybe_downcast_numerical'
. You should try py3pypi
and ipy3nopypirev1
as alternatives.
Searching conda environments¶
You can query the conda packages available using a resource available at from:
- locally:
~/gel_data_resources/software_catalogues/conda_catalogue/
- HPC:
/gel_data_resources/software_catalogues/conda_catalogue/
Take a look at the README.md
in that folder for instructions. You will need to use different scripts depending on if you are on the HPC or not:
- locally:
VDI_query_catalogue.sh
- HPC:
/HPC_query_catalogue.sh
Create your own conda environments¶
You can also create and manage your own conda environments. For security reasons, we have restricted the channels you can access to:
- main (anaconda)
- conda-forge
- bioconda
- r
We will not be expanding this list.
You will need to route package requests via a proxy mirror for conda requests and PyPi pip installation requests. Worked examples are shown below.
Required file¶
You can copy the configuration file from /gel_data_resources/example_config_files/Helix/.condarc
using the command to your $HOME location:
cp /gel_data_resources/example_config_files/Helix/.condarc ~/.
This must be in your $HOME location on the HPC in order for conda to be able to route your requests correctly.
Process¶
We would initially recommend creating your base environment and building on top of this with a series of installs.
If you wish to use a YAML definition file for an environment that you have created externally please bear in mind the changes that you will need to make to the channels so that the process can complete.
As you will not have write access to the default install location within the HPC, you will need to ensure that you use the --prefix flag to specify the install location.
Please refer to the conda documentation for guided steps.
conda¶
For conda to be able to use the channel aliases you will need a copy of the .condarc
file mentioned above.
Once you have this file you will be able to create your environment with the command:
conda create python==<version_number> --prefix /path/to/env/location
Once created you will need to use the env path in order to activate it:
To install additional packages you will need to ensure that you have the correct aliases for each channel:
conda install -c conda-virtual <package_1> -c conda-main <package_2> -c conda-conda-forge <package_3> -c conda-bioconda <pacakge_4> -c (conda-r <package_5>)
These commands should work in the same way as you would expect when run outside of the Research Environment.
Creating conda environment from a YAML definition file¶
It is possible to create and bulk-install packages in a few simple steps. This can be useful to quickly recreate a working conda environment from outside of the Research Environment or to use packages and versions that are the same as a collaborator's. Due to the way access to the package repostories is mediated there are some minor changes that need to be made to the process you would follow outside of the Research Environment.
You can export the definition file with the command:
conda env export > <environment_name>.yml
You will then need to edit the file to:
- remove the build hashes from the packages, as mentioned above we have only configured canonical channels and these may not have the same hashes
- ensure that the channel names are the same as listed in the .condarc
:
- conda-main for anaconda
- conda-conda-forge for conda-forge
- conda-bioconda for bioconda
- conda-r for r
- move any PyPI packages to a requirements.txt file
once these changes have been made, you will be able to create the environment with the command:
conda env create -f <env_name>.yml -p /path/to/personal/conda_envs/env_name
Once the conda environment has been created and activated you will be able to install the PyPI packages with the command:
pip install -r requirements.txt --index-url https://artifactory.aws.gel.ac/artifactory/api/pypi/pypi/simple
pip¶
Once you have an active conda environment you will have the option to install packages hosted by PyPI. To ensure that pip is able to access the correct proxy path you will need to use the --index-url
flag with the path to the proxy:
pip install <package_name> --index-url https://artifactory.aws.gel.ac/artifactory/api/pypi/pypi/simple