Skip to content

somAgg code book


We supply this code book to help you use somAgg in your analysis. This is a live document and will be updated with new feedback and requests. 

The code snippets assume that you are working in the HPC environment and that you submit jobs to the cluster. Please see In-Depth Guide to HPC Usage for more information.

Feedback and requests

For any feedback and requests to the somAgg code book, or if you encounter issues running one of the examples, please reach out via the Genomics England Service Desk including "somAgg" in the title/description of your inquiry.


The majority of queries to aggV2 can be implemented using the applications below: 

Application Description
bcftools A set of utilities that manipulate variant calls in the Variant Call Format (VCF). Use version 1.10.2 via  module load bio/BCFtools/1.10.2-GCC-8.3.0
split-vep A bcftools plug-in to parse VEP annotation (comes with bcftools version 1.10.2-GCC-8.3.0).
LabKey APIs The LabKey client libraries (APIs) provide programmatic access to the clinical/phenotype data. 
R / Python For downstream processing. 
bedtools To intersect, merge, count, complement, and shuffle genomic intervals. Use version 2.27.1 via  module load bio/BEDTools/2.27.1-foss-2018b

We strongly recommend reading the above links to documentation on the respective tools. 

Code book structure

We have divided the code book into the following sections:

Last update: November 27, 2023