An association test of the spatial distribution of rare missense variants within protein structures identify Alzheimer’s disease-related patterns

Over 90% of genetic variants are rare in most modern sequencing studies, such as the Alzheimer’s Disease Sequencing Project (ADSP) Whole Exome Sequencing (WES) data. Furthermore, 54% of the rare variants in ADSP WES are singletons. However, both single variant and unit-based tests are limited in their statistical power to detect an association between rare variants and phenotypes. To best utilize missense rare variants and investigate their biological effect, we examine their association with phenotypes in the context of protein structures. We developed a protein structure-based approach, POKEMON (Protein Optimized Kernel Evaluation of Missense Nucleotides), which evaluates rare missense variants based on their spatial distribution within a protein rather than their allele frequency. The hypothesis behind this test is that the three-dimensional spatial distribution of variants within a protein structure provides functional context to power an association test. POKEMON identified three candidate genes (TREM2, SORL1, and EXOC3L4) and another suggestive gene from the ADSP WES data. For TREM2, and SORL1, two known Alzheimer’s disease (AD) genes, the signal from the spatial cluster is stable even if we exclude known AD risk variants, indicating the presence of additional low-frequency risk variants within these genes. EXOC3L4 is a novel AD risk gene that has a cluster of variants primarily shared by case subjects around the Sec6 domain. This cluster is also validated in an independent replication dataset and a validation dataset with a larger sample size.