Blog

INTRODUCTION: Altered lipid metabolism is implicated in Alzheimer’s disease (AD), but the mechanisms remain obscure. Aging-related declines in circulating plasmalogens containing omega-3 fatty acids may increase AD risk by reducing plasmalogen availability.
METHODS: We measured four ethanolamine plasmalogens (PlsEtns) and four closely related phosphatidylethanolamines (PtdEtns) from the Alzheimer’s Disease Neuroimaging Initiative (ADNI; n = 1547 serum) and University of Pennsylvania (UPenn; n = 112 plasma) cohorts, and derived indices reflecting PlsEtn and PtdEtn metabolism: PL-PX (PlsEtns), PL/PE (PlsEtn/PtdEtn ratios), and PBV (plasmalogen biosynthesis value; a composite index). We tested associations with baseline diagnosis, cognition, and cerebrospinal fluid (CSF) AD biomarkers.
RESULTS: Results revealed statistically significant negative relationships in ADNI between AD versus CN with PL-PX (P = 0.007) and PBV (P = 0.005), late mild cognitive impairment (LMCI) versus cognitively normal (CN) with PL-PX (P = 2.89 × 10-5 ) and PBV (P = 1.99 × 10-4 ), and AD versus LMCI with PL/PE (P = 1.85 × 10-4 ). In the UPenn cohort, AD versus CN diagnosis associated negatively with PL/PE (P = 0.0191) and PBV (P = 0.0296). In ADNI, cognition was negatively associated with plasmalogen indices, including Alzheimer’s Disease Assessment Scale 13-item cognitive subscale (ADAS-Cog13; PL-PX: P = 3.24 × 10-6 ; PBV: P = 6.92 × 10-5 ) and Mini-Mental State Examination (MMSE; PL-PX: P = 1.28 × 10-9 ; PBV: P = 6.50 × 10-9 ). In the UPenn cohort, there was a trend toward a similar relationship of MMSE with PL/PE (P = 0.0949). In ADNI, CSF total-tau was negatively associated with PL-PX (P = 5.55 × 10-6 ) and PBV (P = 7.77 × 10-6 ). Additionally, CSF t-tau/Aβ1-42 ratio was negatively associated with these same indices (PL-PX, P = 2.73 × 10-6 ; PBV, P = 4.39 × 10-6 ). In the UPenn cohort, PL/PE was negatively associated with CSF total-tau (P = 0.031) and t-tau/Aβ1-42 (P = 0.021). CSF Aβ1-42 was not significantly associated with any of these indices in either cohort.
DISCUSSION: These data extend previous studies by showing an association of decreased plasmalogen indices with AD, mild cognitive impairment (MCI), cognition, and CSF tau. Future studies are needed to better define mechanistic relationships, and to test the effects of interventions designed to replete serum plasmalogens.

Approximately 30% of older adults exhibit the neuropathological features of Alzheimer’s disease without signs of cognitive impairment. Yet, little is known about the genetic factors that allow these potentially resilient individuals to remain cognitively unimpaired in the face of substantial neuropathology. We performed a large, genome-wide association study (GWAS) of two previously validated metrics of cognitive resilience quantified using a latent variable modelling approach and representing better-than-predicted cognitive performance for a given level of neuropathology. Data were harmonized across 5108 participants from a clinical trial of Alzheimer’s disease and three longitudinal cohort studies of cognitive ageing. All analyses were run across all participants and repeated restricting the sample to individuals with unimpaired cognition to identify variants at the earliest stages of disease. As expected, all resilience metrics were genetically correlated with cognitive performance and education attainment traits (P-values < 2.5 × 10-20), and we observed novel correlations with neuropsychiatric conditions (P-values 0.42) nor associated with APOE (P-values > 0.13). In single variant analyses, we observed a genome-wide significant locus among participants with unimpaired cognition on chromosome 18 upstream of ATP8B1 (index single nucleotide polymorphism rs2571244, minor allele frequency = 0.08, P = 2.3 × 10-8). The top variant at this locus (rs2571244) was significantly associated with methylation in prefrontal cortex tissue at multiple CpG sites, including one just upstream of ATPB81 (cg19596477; P = 2 × 10-13). Overall, this comprehensive genetic analysis of resilience implicates a putative role of vascular risk, metabolism, and mental health in protection from the cognitive consequences of neuropathology, while also providing evidence for a novel resilience gene along the bile acid metabolism pathway. Furthermore, the genetic architecture of resilience appears to be distinct from that of clinical Alzheimer’s disease, suggesting that a shift in focus to molecular contributors to resilience may identify novel pathways for therapeutic targets.

The Alzheimer’s Disease Sequencing Project (ADSP) undertook whole exome sequencing in 5,740 late-onset Alzheimer disease (AD) cases and 5,096 cognitively normal controls primarily of European ancestry (EA), among whom 218 cases and 177 controls were Caribbean Hispanic (CH). An age-, sex- and APOE based risk score and family history were used to select cases most likely to harbor novel AD risk variants and controls least likely to develop AD by age 85 years. We tested ~1.5 million single nucleotide variants (SNVs) and 50,000 insertion-deletion polymorphisms (indels) for association to AD, using multiple models considering individual variants as well as gene-based tests aggregating rare, predicted functional, and loss of function variants. Sixteen single variants and 19 genes that met criteria for significant or suggestive associations after multiple-testing correction were evaluated for replication in four independent samples; three with whole exome sequencing (2,778 cases, 7,262 controls) and one with genome-wide genotyping imputed to the Haplotype Reference Consortium panel (9,343 cases, 11,527 controls). The top findings in the discovery sample were also followed-up in the ADSP whole-genome sequenced family-based dataset (197 members of 42 EA families and 501 members of 157 CH families). We identified novel and predicted functional genetic variants in genes previously associated with AD. We also detected associations in three novel genes: IGHG3 (p = 9.8 × 10-7), an immunoglobulin gene whose antibodies interact with β-amyloid, a long non-coding RNA AC099552.4 (p = 1.2 × 10-7), and a zinc-finger protein ZNF655 (gene-based p = 5.0 × 10-6). The latter two suggest an important role for transcriptional regulation in AD pathogenesis.

SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources.
AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno.
CONTACT: lswang@pennmedicine.upenn.edu.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Most regulatory chromatin interactions are mediated by various transcription factors (TFs) and involve physically interacting elements such as enhancers, insulators or promoters. To map these elements and interactions at a fine scale, we developed HIPPIE2 that analyzes raw reads from high-throughput chromosome conformation (Hi-C) experiments to identify precise loci of DNA physically interacting regions (PIRs). Unlike standard genome binning approaches (e.g. 10-kb to 1-Mb bins), HIPPIE2 dynamically infers the physical locations of PIRs using the distribution of restriction sites to increase analysis precision and resolution. We applied HIPPIE2 to in situ Hi-C datasets across six human cell lines (GM12878, IMR90, K562, HMEC, HUVEC, NHEK) with matched ENCODE/Roadmap functional genomic data. HIPPIE2 detected 1042 738 distinct PIRs, with high resolution (average PIR length of 1006 bp) and high reproducibility (92.3% in GM12878). PIRs are enriched for epigenetic marks (H3K27ac, H3K4me1) and open chromatin, suggesting active regulatory roles. HIPPIE2 identified 2.8 million significant PIR-PIR interactions, 27.2% of which were enriched for TF binding sites. 50 608 interactions were enhancer-promoter interactions and were enriched for 33 TFs, including known DNA looping/long-range mediators. These findings demonstrate that the novel dynamic approach of HIPPIE2 (https://bitbucket.com/wanglab-upenn/HIPPIE2) enables the characterization of chromatin and regulatory interactions with high resolution and reproducibility.

Each additional copy of the apolipoprotein E4 (APOE4) allele is associated with a higher risk of Alzheimer’s dementia, while the APOE2 allele is associated with a lower risk of Alzheimer’s dementia, it is not yet known whether APOE2 homozygotes have a particularly low risk. We generated Alzheimer’s dementia odds ratios and other findings in more than 5,000 clinically characterized and neuropathologically characterized Alzheimer’s dementia cases and controls. APOE2/2 was associated with a low Alzheimer’s dementia odds ratios compared to APOE2/3 and 3/3, and an exceptionally low odds ratio compared to APOE4/4, and the impact of APOE2 and APOE4 gene dose was significantly greater in the neuropathologically confirmed group than in more than 24,000 neuropathologically unconfirmed cases and controls. Finding and targeting the factors by which APOE and its variants influence Alzheimer’s disease could have a major impact on the understanding, treatment and prevention of the disease.

Recent high-throughput structure-sensitive genome-wide sequencing-based assays have enabled large-scale studies of RNA structure, and robust transcriptome-wide computational prediction of individual RNA structures across RNA classes from these assays has potential to further improve the prediction accuracy. Here, we describe HiPR, a novel method for RNA structure prediction at single-nucleotide resolution that combines high-throughput structure probing data (DMS-seq, DMS-MaPseq) with a novel probabilistic folding algorithm. On validation data spanning a variety of RNA classes, HiPR often increases accuracy for predicting RNA structures, giving researchers new tools to study RNA structure.

BACKGROUND: The analysis and interpretation of data generated from patient-derived clinical samples relies on access to high-quality bioinformatics resources. These are maintained and updated by expert curators extracting knowledge from unstructured biological data described in free-text journal articles and converting this into more structured, computationally-accessible forms. This enables analyses such as functional enrichment of sets of genes/proteins using the Gene Ontology, and makes the searching of data more productive by managing issues such as gene/protein name synonyms, identifier mapping, and data quality.
OBJECTIVE: To undertake a coordinated annotation update of key public-domain resources to better support Alzheimer’s disease research.
METHODS: We have systematically identified target proteins critical to disease process, in part by accessing informed input from the clinical research community.
RESULTS: Data from 954 papers have been added to the UniProtKB, Gene Ontology, and the International Molecular Exchange Consortium (IMEx) databases, with 299 human proteins and 279 orthologs updated in UniProtKB. 745 binary interactions were added to the IMEx human molecular interaction dataset.
CONCLUSION: This represents a significant enhancement in the expert curated data pertinent to Alzheimer’s disease available in a number of biomedical databases. Relevant protein entries have been updated in UniProtKB and concomitantly in the Gene Ontology. Molecular interaction networks have been significantly extended in the IMEx Consortium dataset and a set of reference protein complexes created. All the resources described are open-source and freely available to the research community and we provide examples of how these data could be exploited by researchers.

The Alzheimer’s Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed “consensus calling,” to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.

SUMMARY: We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer’s Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration.
AVAILABILITY AND IMPLEMENTATION: VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (http://www.niagads.org/VCPA).
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.