Blog

SUMMARY: Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG, an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g., chromatin interactions, genomic intervals, quantitative trait loci).
AVAILABILITY: hipFG is freely available at https://bitbucket.org/wanglab-upenn/hipFG. A Docker container is available at https://hub.docker.com/r/wanglab/hipfg.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

INTRODUCTION: The National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site Alzheimer’s Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer’s disease (AD) genetic datasets and genomic annotations.
METHODS: GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project.
RESULTS: GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants.
DISCUSSION: Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias.
HIGHLIGHTS: The National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer’s GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.

Alzheimer’s disease (AD), the leading cause of dementia, has an estimated heritability of approximately 70%1. The genetic component of AD has been mainly assessed using genome-wide association studies, which do not capture the risk contributed by rare variants2. Here, we compared the gene-based burden of rare damaging variants in exome sequencing data from 32,558 individuals-16,036 AD cases and 16,522 controls. Next to variants in TREM2, SORL1 and ABCA7, we observed a significant association of rare, predicted damaging variants in ATP8B4 and ABCA1 with AD risk, and a suggestive signal in ADAM10. Additionally, the rare-variant burden in RIN3, CLU, ZCWPW1 and ACE highlighted these genes as potential drivers of respective AD-genome-wide association study loci. Variants associated with the strongest effect on AD risk, in particular loss-of-function variants, are enriched in early-onset AD cases. Our results provide additional evidence for a major role for amyloid-β precursor protein processing, amyloid-β aggregation, lipid metabolism and microglial function in AD.

Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduces additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases, and functional genomic resources for interpreting variant findings from WGS, based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions that will enhance the data and tools necessary for effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.

The success of genome-wide association studies (GWAS) completed in the last 15 years has reinforced a key fact: polygenic architecture makes a substantial contribution to variation of susceptibility to complex disease, including Alzheimer’s disease. One straight-forward way to capture this architecture and predict which individuals in a population are most at risk is to calculate a polygenic risk score (PRS). This score aggregates the risk conferred across multiple genetic variants, ultimately representing an individual’s predicted genetic susceptibility for a disease. PRS have received increasing attention after having been successfully used in complex traits. This has brought with it renewed attention on new methods which improve the accuracy of risk prediction. While these applications are initially informative, their utility is far from equitable: the majority of PRS models use samples heavily if not entirely of individuals of European descent. This basic approach opens concerns of health equity if applied inaccurately to other population groups, or health disparity if we fail to use them at all. In this review we will examine the methods of calculating PRS and some of their previous uses in disease prediction. We also advocate for, with supporting scientific evidence, inclusion of data from diverse populations in these existing and future studies of population risk via PRS.

INTRODUCTION: Variants in the tau gene (MAPT) region are associated with breast cancer in women and Alzheimer’s disease (AD) among persons lacking apolipoprotein E ε4 (ε4-).
METHODS: To identify novel genes associated with tau-related pathology, we conducted two genome-wide association studies (GWAS) for AD, one among 10,340 ε4- women in the Alzheimer’s Disease Genetics Consortium (ADGC) and another in 31 members (22 women) of a consanguineous Hutterite kindred.
RESULTS: We identified novel associations of AD with MGMT variants in the ADGC (rs12775171, odds ratio [OR] = 1.4, P = 4.9 × 10-8 ) and Hutterite (rs12256016 and rs2803456, OR = 2.0, P = 1.9 × 10-14 ) datasets. Multi-omics analyses showed that the most significant and largest number of associations among the single nucleotide polymorphisms (SNPs), DNA-methylated CpGs, MGMT expression, and AD-related neuropathological traits were observed among women. Furthermore, promoter capture Hi-C analyses revealed long-range interactions of the MGMT promoter with MGMT SNPs and CpG sites.
DISCUSSION: These findings suggest that epigenetically regulated MGMT expression is involved in AD pathogenesis, especially in women.

Approximately 30% of elderly adults are cognitively unimpaired at time of death despite presence of Alzheimer’s disease (AD) neuropathology at autopsy. Studying individuals who are resilient to the cognitive consequences of AD neuropathology may uncover novel therapeutic targets to treat AD. It is well-established that there are sex differences in response to AD pathology, and growing evidence suggests that genetic factors may contribute to these differences. Taken together, we sought to elucidate sex-specific genetic drivers of resilience. We extended our recent large-scale genomic analysis of resilience in which we harmonized cognitive data across four cohorts of cognitive aging, in-vivo amyloid PET across two cohorts, and autopsy measures of amyloid neuritic plaque burden across two cohorts. These data were leveraged to build robust, continuous resilience phenotypes. With these phenotypes, we performed sex-stratified (N(males) = 2,093, N(females) = 2,931) and sex-interaction (N(both sexes) = 5,024) genome-wide association studies (GWAS), gene- and pathway-based tests, and genetic correlation analyses to clarify the variants, genes, and molecular pathways that relate to resilience in a sex-specific manner. Estimated among cognitively normal individuals of both sexes, resilience was 20-25% heritable, and when estimated in either sex among cognitively normal individuals, resilience was 15-44% heritable. In our GWAS, we identified a female-specific locus on chromosome 10 (rs827389, β(females) = 0.08, P(females) = 5.76E-09, β(males)=-0.01, P(males) = 0.70, β(interaction) = 0.09, P(interaction) = 1.01E-04) in which the minor allele was associated with higher resilience scores among females. This locus is located within chromatin loops that interact with promoters of genes involved in RNA processing, including GATA3. Finally, our genetic correlation analyses revealed shared genetic architecture between resilience phenotypes and other complex traits, including a female-specific association with frontotemporal dementia and male-specific associations with heart rate variability traits. We also observed opposing associations between sexes for multiple sclerosis, such that more resilient females had a lower genetic susceptibility to multiple sclerosis, and more resilient males had a higher genetic susceptibility to multiple sclerosis. Overall, we identified sex differences in the genetic architecture of resilience, identified a female-specific resilience locus, and highlighted numerous sex-specific molecular pathways that may underly resilience to AD pathology. This study illustrates the need to conduct sex-aware genomic analyses to identify novel targets that are unidentified in sex-agnostic models. Our findings support the theory that the most successful treatment for an individual with AD may be personalized based on their biological sex and genetic context.

Over 90% of genetic variants are rare in most modern sequencing studies, such as the Alzheimer’s Disease Sequencing Project (ADSP) Whole Exome Sequencing (WES) data. Furthermore, 54% of the rare variants in ADSP WES are singletons. However, both single variant and unit-based tests are limited in their statistical power to detect an association between rare variants and phenotypes. To best utilize missense rare variants and investigate their biological effect, we examine their association with phenotypes in the context of protein structures. We developed a protein structure-based approach, POKEMON (Protein Optimized Kernel Evaluation of Missense Nucleotides), which evaluates rare missense variants based on their spatial distribution within a protein rather than their allele frequency. The hypothesis behind this test is that the three-dimensional spatial distribution of variants within a protein structure provides functional context to power an association test. POKEMON identified three candidate genes (TREM2, SORL1, and EXOC3L4) and another suggestive gene from the ADSP WES data. For TREM2, and SORL1, two known Alzheimer’s disease (AD) genes, the signal from the spatial cluster is stable even if we exclude known AD risk variants, indicating the presence of additional low-frequency risk variants within these genes. EXOC3L4 is a novel AD risk gene that has a cluster of variants primarily shared by case subjects around the Sec6 domain. This cluster is also validated in an independent replication dataset and a validation dataset with a larger sample size.

Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user’s experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 109 hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).

INTRODUCTION: Progranulin (GRN) mutations occur in frontotemporal lobar degeneration (FTLD) and in Alzheimer’s disease (AD), often with TDP-43 pathology.
METHODS: We determined the frequency of rs5848 and rare, pathogenic GRN mutations in two autopsy and one family cohort. We compared Braak stage, β-amyloid load, hyperphosphorylated tau (PHFtau) tangle density and TDP-43 pathology in GRN carriers and non-carriers.
RESULTS: Pathogenic GRN mutations were more frequent in all cohorts compared to the Genome Aggregation Database (gnomAD), but there was no evidence for association with AD. Pathogenic GRN carriers had significantly higher PHFtau tangle density adjusting for age, sex and APOE ε4 genotype. AD patients with rs5848 had higher frequencies of hippocampal sclerosis and TDP-43 deposits. Twenty-two rare, pathogenic GRN variants were observed in the family cohort.
DISCUSSION: GRN mutations in clinical and neuropathological AD increase the burden of tau-related brain pathology but show no specific association with β-amyloid load or AD.