Aggregate Variant Testing "input variables" file¶
The "input variables" file (or simply inputs file), linked in the main AVT workflow directory as "inputs.json", is a large file with many options that can be varied by you. An example of an "input variables" file is shown below, along with a breakdown by section, and an explanation for each input. Where appropriate, the documentation will refer to an external source (i.e. SAIGE-GENE options).
Important note
The "input variables" file is a JSON file, and therefore it does not support the use of comments. All comments in section 'Components of the "input variables" file', introduced by an arrow "<-", are added only for explanatory purposes. Please make sure you don't include the comments in your actual inputs file.
Example "input variables" file¶
Example input file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 |
|
Components of the "input variables" file¶
Blank lines help separate different sections of the inputs file.
The example inputs file shown above, which is the default for v2.0 of the workflow, is composed of two main parts, separated by 5 consecutive blank lines.
The top part contains variables that you are likely to modify to customise your workflow run (although many of those can be left with default values, see details below). This part is separated into five further sections, using three consecutive blank lines - these sections contain input variables for different parts of the workflow, and variables in each section are prefixed in the following manner:
- master_aggregate_variant_testing
- master_aggregate_variant_testing.part_1_inputs
- master_aggregate_variant_testing.part_2_filtering
- master_aggregate_variant_testing.part_3_GRM_creation
- master_aggregate_variant_testing.part_4_testing
The bottom part of the inputs file contains variables that you are unlikely to modify, although these variables too can be customised if needed.
Top section - Main workflow file inputs¶
This section contains the following input variables with explanations:
Example main workflow file inputs
Top section - Workflow part 1 inputs (translating inputs to chromosomal regions)¶
This section contains the following input variables with explanations:
Part 1 inputs
Only one of "genes_input_file
", "coordinates_input_file
", and "groups_input_file
" needs to be specified. If a "chromosomes_input_file
" is specified together with one of those other files, it will be used to subset the other file's content; if it is specified on its own (as in the default example), it will be translated into the corresponding list of Ensembl genes. For the file paths, both relative and absolute paths are accepted.
Top section - Workflow part 2 inputs (filtering)¶
This section contains the following input variables with explanations:
Part 2 Inputs
The workflow uses a python script to filter VEP functional annotation. This has consequences in how you specify filtering based on numbers vs strings.
Numbers are straightforward. The syntax is as follows: {"score": "gnomADg_AF", "condition": "<0.001"}. You can specify any valid comparison operator (==, !=, >=, <=) and filter numbers based on them.
Strings behave a little differently. The syntax is as in the following example: {"score": "LoF", "condition": "==\"HC\"", "include_missing": "no"}
. Note the escaped double-quotes (\"
) around the string that you want to match. This is required due to the nature of the underlying python script. If they are omitted, you will get an error like the following: "Error: object 'HC' not found"
Top section - Workflow part 3 inputs (preparing files for GRM creation)¶
This section contains the following input variables with explanations:
Part 3 inputs
Top section - Workflow part 4 inputs (Aggregate Variant Tests with SAIGE-GENE)¶
This section contains the following input variables with explanations:
Part 4 inputs
Bottom section¶
This section contains the following input variables with explanations:
Bottom section
Help and support¶
Please reach out via the Genomics England Service Desk for any issues related to running this script, including "AVT_workflow" in the title/description of your inquiry.