The HPC is changing
We will soon be switching to a new High Performance Cluster, called Double Helix. This will mean that some of the commands you use to connect to the HPC and call modules will change. We will inform you by email when you are switching over, allowing you to make the necessary changes to your scripts. Please check our HPC changeover notes for more details on what will change.
somAgg code book¶
This code book provides some sample snippets to help you use somAgg in your analyses. These include using BEDtools to find the correct chunk file to use, and using BCFtools to query the aggregate files themselves.
Overview¶
The code snippets assume that you are working in the HPC environment and that you submit jobs to the cluster. Please see In-Depth Guide to HPC Usage for more information.
Feedback and requests
For any feedback and requests to the somAgg code book, or if you encounter issues running one of the examples, please reach out via the Genomics England Service Desk including "somAgg" in the title/description of your inquiry.
Applications¶
The majority of queries to aggV2 can be implemented using the applications below:
Application | Description |
---|---|
bcftools | A set of utilities that manipulate variant calls in the Variant Call Format (VCF). Use version 1.10.2 via module load bio/BCFtools/1.10.2-GCC-8.3.0 |
split-vep | A bcftools plug-in to parse VEP annotation (comes with bcftools version 1.10.2-GCC-8.3.0). |
LabKey APIs | The LabKey client libraries (APIs) provide programmatic access to the clinical/phenotype data. |
R / Python | For downstream processing. |
bedtools | To intersect, merge, count, complement, and shuffle genomic intervals. Use version 2.27.1 via module load bio/BEDTools/2.27.1-foss-2018b |
Code book structure¶
We have divided the code book into the following sections: