How do I know if a pathogenic variant with a high allele frequency is real or a variant calling error?ΒΆ


I suspect that a pathogenic variant is the result of a sequencing or variant calling error. In our cancer-oriented project, we have noticed an abnormally high allele frequency in germline VCFs within the MSH2 gene. While studies have shown that germline MSH2 mutations may been linked to hereditary ovarian cancers, we believe the high allele frequency is unlikely to be true. Could this the result of a sequencing error or a variant calling issue?


This is likely an issue with the variant calling for these regions. For example, chr2:47414419:GT>G (MSH2) has a high allele frequency in individual germline VCFs. However, the BAM files will not support this notion.

More specifically, the region itself, i.e. chr2:47414419-47414422 (GTAA)for brevity, is subsequently followed by a long sequence of A's. When you investigate the individual BAM files of VCFs displaying GT>G, it is more likely to show a deletion of 1x A (chr2:47414420:TA>T). Furthermore, in our germline aggregate (aggV2) chr2:47414419:GT>G is not a PASS variant either.

Variant listed in individual VCF:

chr2 47414419 GT T

But based on the BAM files, the probable case is:

chr2 47414420 TA A

This can lead to false positives when you are working with individual samples. Therefore, we have provided a list of similar known germline variants on our file system. This list is also used by our internal clinical pipelines to ensure a high quality tiering process.

This list can be found here:


If your variant is not part of this list, please reach out the Genomics England Service Desk so our teams can investigate further.

