Blog

INTRODUCTION: The Alzheimer’s Disease Sequencing Project (ADSP) is a national initiative to understand the genetic architecture of Alzheimer’s disease and related dementias (ADRD) by integrating whole genome sequencing (WGS) with other genetic, phenotypic, and harmonized datasets from diverse populations.
METHODS: The Genome Center for Alzheimer’s Disease (GCAD) uniformly processed WGS from 36,361 ADSP samples, including 35,014 genetically unique participants of which 45% are from non-European ancestry, across 17 cohorts in 14 countries in this fourth release (R4).
RESULTS: This sequencing effort identified 387 million bi-allelic variants, 42 million short insertions/deletions, and 6.8 million structural variants. Annotations and quality control data are available for all variants and samples. Additionally, detailed phenotypes from 15,927 participants across 10 domains are also provided. A linkage disequilibrium panel was created using unrelated AD cases and controls.
DISCUSSION: Researchers can access and analyze the genetic data via the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS) Data Sharing Service, the VariXam, or NIAGADS GenomicsDB.
HIGHLIGHTS: We detailed the genetic architecture and quality of the Alzheimer’s Disease Sequencing Project release 4 whole genome sequences. We identified 435 million single nucleotide polymorphisms, insertions and deletions, and structural variants from diverse genomes. We harmonized extensive phenotypes, linkage disequilibrium reference panel on subset of samples. Data is publicly available at NIAGADS Data Storage Site, variants and annotations are browsable on two different websites.

Up to 30% of older adults meet pathological criteria for a diagnosis of Alzheimer’s disease at autopsy yet never show signs of cognitive impairment. Recent work has highlighted genetic drivers of this resilience, or better-than-expected cognitive performance given a level of neuropathology, that allow the aged brain to protect itself from the downstream consequences of amyloid and tau deposition. However, models of resilience have been constrained by reliance on measures of neuropathology, substantially limiting the number of participants available for analysis. We sought to determine if novel approaches using APOE allele status, age, and other demographic variables as a proxy for neuropathology could still effectively quantify resilience and uncover novel genetic drivers associated with better-than-expected cognitive performance while vastly expanding sample size and statistical power. Leveraging 20,513 participants from eight well-characterized cohort studies of aging, we determined the effects of genetic variants on resilience metrics using mixed-effects regressions. The outcome of interest was residual cognitive resilience, quantified from residuals in three cognitive domains (memory, executive function, and language) and built within two frameworks: “silver” models, which obviate the requirement for neuropathological data (n=17,241), and “gold” models, which include post-mortem neuropathological assessments (n=3,272). We then performed cross-ancestry genome wide association studies (European ancestry n=18,269, African ancestry n=2,244), gene and pathway-based tests, and genetic correlation analyses. All analyses were conducted across all participants and repeated when restricted to those with unimpaired cognition at baseline. Despite different modeling approaches, the silver and gold phenotypes were highly correlated (R=0.77-0.88) and displayed comparable performance in quantifying better-than or worse-than-expected cognition, enabling silver-gold meta-analyses. Genetic correlation analyses highlighted associations of resilience with multiple neuropsychiatric and cardiovascular traits (PFDR values < 5.0×10-2). In pathway-level tests, we observed three significant associations with resilience: metabolism of amino acids and derivatives (PFDR=4.1×10-2), negative regulation of transforming growth factor beta production (PFDR=1.9×10-2), and severe acute respiratory syndrome (PFDR=3.9×10-4). Finally, in single-variant analyses, we identified a locus on chromosome 17 approaching genome-wide significance among cognitively unimpaired participants (index single nucleotide polymorphism: rs757022, minor allele frequency = 0.18, β=0.08, P=1.1×10-7). The top variant at this locus (rs757022) was significantly associated with expression of numerous ATP-binding cassette genes in brain. Overall, through validating a novel modeling approach, we demonstrate the utility of silver models of resilience to increase statistical power and participant diversity.

BACKGROUND: The 17q21.31 region with various structural forms characterized by the H1/H2 haplotypes and three large copy number variations (CNVs) represents the strongest risk locus in progressive supranuclear palsy (PSP).
OBJECTIVE: To investigate the association between CNVs and structural forms on 17q.21.31 with the risk of PSP.
METHODS: Utilizing whole genome sequencing data from 1684 PSP cases and 2392 controls, the three large CNVs (α, β, and γ) and structural forms within 17q21.31 were identified and analyzed for their association with PSP.
RESULTS: We found that the copy number of γ was associated with increased PSP risk (odds ratio [OR] = 1.10, P = 0.0018). From H1β1γ1 (OR = 1.21) and H1β2γ1 (OR = 1.24) to H1β1γ4 (OR = 1.57), structural forms of H1 with additional copies of γ displayed a higher risk for PSP. The frequency of the risk sub-haplotype H1c rises from 1% in individuals with two γ copies to 88% in those with eight copies. Additionally, γ duplication up-regulates expression of ARL17B, LRRC37A/LRRC37A2, and NSFP1, while down-regulating KANSL1. Single-nucleus RNA-seq of the dorsolateral prefrontal cortex analysis reveals γ duplication primarily up-regulates LRRC37A/LRRC37A2 in neuronal cells.
CONCLUSIONS: The copy number of γ is associated with the risk of PSP after adjusting for H1/H2, indicating that the complex structure at 17q21.31 is an important consideration when evaluating the genetic risk of PSP. © 2025 The Author(s). Movement Disorders published by Wiley Periodicals LLC on behalf of International Parkinson and Movement Disorder Society.

MOTIVATION: statistics from genome-wide association studies (GWAS) are widely used in fine-mapping and colocalization analyses to identify causal variants and their enrichment in functional contexts, such as affected cell types and genomic features. With the expansion of functional genomic (FG) datasets, which now include hundreds of thousands of tracks across various cell and tissue types, it is critical to establish scalable algorithms integrating thousands of diverse FG annotations with GWAS results.
RESULTS: We propose BTS (Bayesian Tissue Score), a novel, highly efficient algorithm uniquely designed for 1) identifying affected cell types and functional elements (context-mapping) and 2) fine-mapping potentially causal variants in a context-specific manner using large collections of cell type-specific FG annotation tracks. BTS leverages GWAS summary statistics and annotation-specific Bayesian models to analyze genome-wide annotation tracks, including enhancers, open chromatin, and histone marks. We evaluated BTS on GWAS summary statistics for immune and cardiovascular traits, such as Inflammatory Bowel Disease (IBD), Rheumatoid Arthritis (RA), Systemic Lupus Erythematosus (SLE), and Coronary Artery Disease (CAD). Our results demonstrate that BTS is over 100x more efficient in estimating functional annotation effects and context-specific variant fine-mapping compared to existing methods. Importantly, this large-scale Bayesian approach prioritizes both known and novel annotations, cell types, genomic regions, and variants and provides valuable biological insights into the functional contexts of these diseases.
AVAILABILITY AND IMPLEMENTATION: Docker image is available at https://hub.docker.com/r/wanglab/bts with pre-installed BTS R package (https://bitbucket.org/wanglab-upenn/BTS-R) and BTS GWAS summary statistics analysis pipeline (https://bitbucket.org/wanglab-upenn/bts-pipeline).

The Alzheimer’s Disease Sequencing Project (ADSP) is a national initiative to understand the genetic architecture of Alzheimer’s Disease and Related Dementias (AD/ADRD) by sequencing whole genomes of affected participants and age-matched cognitive controls from diverse populations. The Genome Center for Alzheimer’s Disease (GCAD) processed whole-genome sequencing data from 36,361 ADSP participants, including 35,014 genetically unique participants of which 45% are from non-European ancestry, across 17 cohorts in 14 countries in this fourth release (R4). This sequencing effort identified 387 million bi-allelic variants, 42 million short insertions/deletions, and 2.2 million structural variants. Annotations and quality control data are available for all variants and samples. Additionally, detailed phenotypes from 15,927 participants across 10 domains are also provided. A linkage disequilibrium panel was created using unrelated AD cases and controls. Researchers can access and analyze the genetic data via NIAGADS Data Sharing Service, the VariXam tool, or NIAGADS GenomicsDB.

MOTIVATION: Chromatin conformation capture experiments (CCC), such as Hi-C and Capture Hi-C (CHiC) work to elucidate the three-dimensional organization of the genome and the underlying epigenetic regulatory structures within. CCC experiments produce large amounts of FASTQ sequencing data with a substantial amount of technical noise and require sophisticated computational pipelines in order to extract meaningful results. Large-scale CCC data repositories like 4D Nucleome and ENCODE mostly provide raw contact information but lack annotated, statistically significant interaction data suitable for downstream genetic and genomic analyses.
RESULTS: Here, we present CHARMER, an end-to-end pipeline integrated across multiple CCC assay types (HiC, CHiC) which generates statistically significant, harmonized, queryable, chromatin interactions in a consistent BED-like format across cell/tissue types and CCC assays.
AVAILABILITY: CHARMER is freely available at https://bitbucket.org/wanglab-upenn/CHARMER and harmonized chromatin interaction data will be available in the upcoming version of the FILER database (https://lisanwanglab.org/FILER).

Copy number variants (CNVs) are DNA gains or losses involving >50 base pairs. Assessing CNV effects on disease risk requires consideration of several factors. First, there are no natural definitions for CNV loci. Second, CNV effects can depend on dosage and length. Third, CNV effects can be more accurately estimated when all CNV events in a genomic region are analyzed together to assess their joint effects. We propose a new framework for association analysis that directly models an individual’s entire CNV profile within a genomic region. This framework represents an individual’s CNVs using a CNV profile curve to capture variations in CNV length and dosage and to bypass the need to predefine CNV loci. CNV effects are estimated at each genome position, making the results comparable across different studies. To jointly estimate the effects of all CNVs, we use a Lasso penalty to select CNVs associated with the trait and integrate a weighted L2-fusion penalty to encourage similar effects of adjacent CNVs when supported by the data. Simulations show that the proposed model can more effectively identify causal CNVs while maintaining false positive rates comparable to baseline methods and yield more precise effect-size estimates across different settings. When applied to CNV derived from whole genome sequencing data of the Alzheimer’s Disease Sequencing Project, the proposed methods identify additional CNVs associated with Alzheimer’s Disease (AD). These identified CNVs overlap with several known AD-risk genes and are significantly enriched by biological processes related to neuron structures and functions crucial in AD development.

BACKGROUND: Blood-derived mitochondrial DNA copy number (mtDNA-CN) is a proxy measurement of mitochondrial function in the peripheral and central systems. Abnormal mtDNA-CN not only indicates impaired mtDNA replication and transcription machinery but also dysregulated biological processes such as energy and lipid metabolism. However, the relationship between mtDNA-CN and Alzheimer disease (AD) is unclear.
METHODS: We performed two-sample Mendelian randomization (MR) using publicly available summary statistics from GWAS for mtDNA-CN and AD to investigate the causal relationship between mtDNA-CN and AD. We estimated mtDNA-CN using whole-genome sequence data from blood and brain samples of 13,799 individuals from the Alzheimer’s Disease Sequencing Project. Linear and Cox proportional hazards models adjusting for age, sex, and study phase were used to assess the association of mtDNA-CN with AD. The association of AD biomarkers and serum metabolites with mtDNA-CN in blood was evaluated in Alzheimer’s Disease Neuroimaging Initiative using linear regression. We conducted a causal mediation analysis to test the natural indirect effects of mtDNA-CN change on AD risk through the significantly associated biomarkers and metabolites.
RESULTS: MR analysis suggested a causal relationship between decreased blood-derived mtDNA-CN and increased risk of AD (OR = 0.68; P = 0.013). Survival analysis showed that decreased mtDNA-CN was significantly associated with higher risk of conversion from mild cognitive impairment to AD (HR = 0.80; P = 0.002). We also identified significant associations of mtDNA-CN with brain FDG-PET (β = 0.103; P = 0.022), amyloid-PET (β = 0.117; P = 0.034), CSF amyloid-β (Aβ) 42/40 (β=-0.124; P = 0.017), CSF t-Tau (β = 0.128; P = 0.015), p-Tau (β = 0.140; P = 0.008), and plasma NFL (β=-0.124; P = 0.004) in females. Several lipid species, amino acids, biogenic amines in serum were also significantly associated with mtDNA-CN. Causal mediation analyses showed that about a third of the effect of mtDNA-CN on AD risk was mediated by plasma NFL (P = 0.009), and this effect was more significant in females (P < 0.005).
CONCLUSIONS: Our study indicates that mtDNA-CN measured in blood is predictive of AD and is associated with AD biomarkers including plasma NFL particularly in females. Further, we illustrate that decreased mtDNA-CN possibly increases AD risk through dysregulation of mitochondrial lipid metabolism and inflammation.

INTRODUCTION: Alzheimer’s disease (AD) is a common disorder of the elderly that is both highly heritable and genetically heterogeneous.
METHODS: We investigated the association of AD with both common variants and aggregates of rare coding and non-coding variants in 13,371 individuals of diverse ancestry with whole genome sequencing (WGS) data.
RESULTS: Pooled-population analyses of all individuals identified genetic variants at apolipoprotein E (APOE) and BIN1 associated with AD (p < 5 × 10-8). Subgroup-specific analyses identified a haplotype on chromosome 14 including PSEN1 associated with AD in Hispanics, further supported by aggregate testing of rare coding and non-coding variants in the region. Common variants in LINC00320 were observed associated with AD in Black individuals (p = 1.9 × 10-9). Finally, we observed rare non-coding variants in the promoter of TOMM40 distinct of APOE in pooled-population analyses (p = 7.2 × 10-8).
DISCUSSION: We observed that complementary pooled-population and subgroup-specific analyses offered unique insights into the genetic architecture of AD.
HIGHLIGHTS: We determine the association of genetic variants with Alzheimer's disease (AD) using 13,371 individuals of diverse ancestry with whole genome sequencing (WGS) data. We identified genetic variants at apolipoprotein E (APOE), BIN1, PSEN1, and LINC00320 associated with AD. We observed rare non-coding variants in the promoter of TOMM40 distinct of APOE.

Progressive supranuclear palsy (PSP), a rare Parkinsonian disorder, is characterized by problems with movement, balance, and cognition. PSP differs from Alzheimer’s disease (AD) and other diseases, displaying abnormal microtubule-associated protein tau by both neuronal and glial cell pathologies. Genetic contributors may mediate these differences; however, the genetics of PSP remain underexplored. Here we conduct the largest genome-wide association study (GWAS) of PSP which includes 2779 cases (2595 neuropathologically-confirmed) and 5584 controls and identify six independent PSP susceptibility loci with genome-wide significant (P < 5 × 10-8) associations, including five known (MAPT, MOBP, STX6, RUNX2, SLCO1A2) and one novel locus (C4A). Integration with cell type-specific epigenomic annotations reveal an oligodendrocytic signature that might distinguish PSP from AD and Parkinson's disease in subsequent studies. Candidate PSP risk gene prioritization using expression quantitative trait loci (eQTLs) identifies oligodendrocyte-specific effects on gene expression in half of the genome-wide significant loci, and an association with C4A expression in brain tissue, which may be driven by increased C4A copy number. Finally, histological studies demonstrate tau aggregates in oligodendrocytes that colocalize with C4 (complement) deposition. Integrating GWAS with functional studies, epigenomic and eQTL analyses, we identify potential causal roles for variation in MOBP, STX6, RUNX2, SLCO1A2, and C4A in PSP pathogenesis.