Each additional copy of the apolipoprotein E4 (APOE4) allele is associated with a higher risk of Alzheimer’s dementia, while the APOE2 allele is associated with a lower risk of Alzheimer’s dementia, it is not yet known whether APOE2 homozygotes have a particularly low risk. We generated Alzheimer’s dementia odds ratios and other findings in more than 5,000 clinically characterized and neuropathologically characterized Alzheimer’s dementia cases and controls. APOE2/2 was associated with a low Alzheimer’s dementia odds ratios compared to APOE2/3 and 3/3, and an exceptionally low odds ratio compared to APOE4/4, and the impact of APOE2 and APOE4 gene dose was significantly greater in the neuropathologically confirmed group than in more than 24,000 neuropathologically unconfirmed cases and controls. Finding and targeting the factors by which APOE and its variants influence Alzheimer’s disease could have a major impact on the understanding, treatment and prevention of the disease.
Blog
Recent high-throughput structure-sensitive genome-wide sequencing-based assays have enabled large-scale studies of RNA structure, and robust transcriptome-wide computational prediction of individual RNA structures across RNA classes from these assays has potential to further improve the prediction accuracy. Here, we describe HiPR, a novel method for RNA structure prediction at single-nucleotide resolution that combines high-throughput structure probing data (DMS-seq, DMS-MaPseq) with a novel probabilistic folding algorithm. On validation data spanning a variety of RNA classes, HiPR often increases accuracy for predicting RNA structures, giving researchers new tools to study RNA structure.
BACKGROUND: The analysis and interpretation of data generated from patient-derived clinical samples relies on access to high-quality bioinformatics resources. These are maintained and updated by expert curators extracting knowledge from unstructured biological data described in free-text journal articles and converting this into more structured, computationally-accessible forms. This enables analyses such as functional enrichment of sets of genes/proteins using the Gene Ontology, and makes the searching of data more productive by managing issues such as gene/protein name synonyms, identifier mapping, and data quality.
OBJECTIVE: To undertake a coordinated annotation update of key public-domain resources to better support Alzheimer’s disease research.
METHODS: We have systematically identified target proteins critical to disease process, in part by accessing informed input from the clinical research community.
RESULTS: Data from 954 papers have been added to the UniProtKB, Gene Ontology, and the International Molecular Exchange Consortium (IMEx) databases, with 299 human proteins and 279 orthologs updated in UniProtKB. 745 binary interactions were added to the IMEx human molecular interaction dataset.
CONCLUSION: This represents a significant enhancement in the expert curated data pertinent to Alzheimer’s disease available in a number of biomedical databases. Relevant protein entries have been updated in UniProtKB and concomitantly in the Gene Ontology. Molecular interaction networks have been significantly extended in the IMEx Consortium dataset and a set of reference protein complexes created. All the resources described are open-source and freely available to the research community and we provide examples of how these data could be exploited by researchers.
The Alzheimer’s Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed “consensus calling,” to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.
SUMMARY: We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer’s Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration.
AVAILABILITY AND IMPLEMENTATION: VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (http://www.niagads.org/VCPA).
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Small non-coding RNAs (sncRNAs, 800 curated experiments from ENCODE and GEO/SRA across multiple RNA-seq protocols for both GRCh38/hg38 and GRCh37/hg19 assemblies are integrated in DASHR. Moreover, DASHR is the first to contain both known and novel, previously un-annotated sncRNA loci identified by unsupervised segmentation (13 times more loci with 1 678 800 total). Additionally, DASHR v2.0 adds >3 200 000 annotations for non-small RNA genes and other genomic features (long-noncoding RNAs, mRNAs, promoters, repeats). Furthermore, DASHR v2.0 introduces an enhanced user interface, interactive experiment-by-locus table view, sncRNA locus sorting and filtering by biological features. All annotation and expression information directly downloadable and accessible as UCSC genome browser tracks.
AVAILABILITY AND IMPLEMENTATION: DASHR v2.0 is freely available at https://lisanwanglab.org/DASHRv2.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
The poor outcomes in infant acute lymphoblastic leukemia (ALL) necessitate new treatments. Here we discover that EIF4E protein is elevated in most cases of infant ALL and test EIF4E targeting by the repurposed antiviral agent ribavirin, which has anticancer properties through EIF4E inhibition, as a potential treatment. We find that ribavirin treatment of actively dividing infant ALL cells on bone marrow stromal cells (BMSCs) at clinically achievable concentrations causes robust proliferation inhibition in proportion with EIF4E expression. Further, we find that ribavirin treatment of KMT2A-rearranged (KMT2A-R) infant ALL cells and the KMT2A-AFF1 cell line RS4:11 inhibits EIF4E, leading to decreases in oncogenic EIF4E-regulated cell growth and survival proteins. In ribavirin-sensitive KMT2A-R infant ALL cells and RS4:11 cells, EIF4E-regulated proteins with reduced levels of expression following ribavirin treatment include MYC, MCL1, NBN, BCL2 and BIRC5. Ribavirin-treated RS4:11 cells exhibit impaired EIF4E-dependent nuclear to cytoplasmic export and/or translation of the corresponding mRNAs, as well as reduced phosphorylation of the p-AKT1, p-EIF4EBP1, p-RPS6 and p-EIF4E signaling proteins. This leads to an S-phase cell cycle arrest in RS4:11 cells corresponding to the decreased proliferation. Ribavirin causes nuclear EIF4E to re-localize to the cytoplasm in KMT2A-AFF1 infant ALL and RS4:11 cells, providing further evidence for EIF4E inhibition. Ribavirin slows increases in peripheral blasts in KMT2A-R infant ALL xenograft-bearing mice. Ribavirin cooperates with chemotherapy, particularly L-asparaginase, in reducing live KMT2A-AFF1 infant ALL cells in BMSC co-cultures. This work establishes that EIF4E is broadly elevated across infant ALL and that clinically relevant ribavirin exposures have preclinical activity and effectively inhibit EIF4E in KMT2A-R cases, suggesting promise in EIF4E targeting using ribavirin as a means of treatment.
Risk for late-onset Alzheimer’s disease (LOAD), the most prevalent dementia, is partially driven by genetics. To identify LOAD risk loci, we performed a large genome-wide association meta-analysis of clinically diagnosed LOAD (94,437 individuals). We confirm 20 previous LOAD risk loci and identify five new genome-wide loci (IQCK, ACE, ADAM10, ADAMTS1, and WWOX), two of which (ADAM10, ACE) were identified in a recent genome-wide association (GWAS)-by-familial-proxy of Alzheimer’s or dementia. Fine-mapping of the human leukocyte antigen (HLA) region confirms the neurological and immune-mediated disease haplotype HLA-DR15 as a risk factor for LOAD. Pathway analysis implicates immunity, lipid metabolism, tau binding proteins, and amyloid precursor protein (APP) metabolism, showing that genetic variants affecting APP and Aβ processing are associated not only with early-onset autosomal dominant Alzheimer’s disease but also with LOAD. Analyses of risk genes and pathways show enrichment for rare variants (P = 1.32 × 10-7), indicating that additional rare variants remain to be identified. We also identify important genetic correlations between LOAD and traits such as family history of dementia and education.
Most of the loci identified by genome-wide association studies (GWAS) for late-onset Alzheimer’s disease (LOAD) are in strong linkage disequilibrium (LD) with nearby variants all of which could be the actual functional variants, often in non-protein-coding regions and implicating underlying gene regulatory mechanisms. We set out to characterize the causal variants, regulatory mechanisms, tissue contexts, and target genes underlying these associations. We applied our INFERNO algorithm to the top 19 non-APOE loci from the IGAP GWAS study. INFERNO annotated all LD-expanded variants at each locus with tissue-specific regulatory activity. Bayesian co-localization analysis of summary statistics and eQTL data was performed to identify tissue-specific target genes. INFERNO identified enhancer dysregulation in all 19 tag regions analyzed, significant enrichments of enhancer overlaps in the immune-related blood category, and co-localized eQTL signals overlapping enhancers from the matching tissue class in ten regions (ABCA7, BIN1, CASS4, CD2AP, CD33, CELF1, CLU, EPHA1, FERMT2, ZCWPW1). In several cases, we identified dysregulation of long noncoding RNA (lncRNA) transcripts and applied the lncRNA target identification algorithm from INFERNO to characterize their downstream biological effects. We also validated the allele-specific effects of several variants on enhancer function using luciferase expression assays. By integrating functional genomics with GWAS signals, our analysis yielded insights into the regulatory mechanisms, tissue contexts, genes, and biological processes affected by noncoding genetic variation associated with LOAD risk.
The majority of variants identified by genome-wide association studies (GWAS) reside in the noncoding genome, affecting regulatory elements including transcriptional enhancers. However, characterizing their effects requires the integration of GWAS results with context-specific regulatory activity and linkage disequilibrium annotations to identify causal variants underlying noncoding association signals and the regulatory elements, tissue contexts, and target genes they affect. We propose INFERNO, a novel method which integrates hundreds of functional genomics datasets spanning enhancer activity, transcription factor binding sites, and expression quantitative trait loci with GWAS summary statistics. INFERNO includes novel statistical methods to quantify empirical enrichments of tissue-specific enhancer overlap and to identify co-regulatory networks of dysregulated long noncoding RNAs (lncRNAs). We applied INFERNO to two large GWAS studies. For schizophrenia (36,989 cases, 113,075 controls), INFERNO identified putatively causal variants affecting brain enhancers for known schizophrenia-related genes. For inflammatory bowel disease (IBD) (12,882 cases, 21,770 controls), INFERNO found enrichments of immune and digestive enhancers and lncRNAs involved in regulation of the adaptive immune response. In summary, INFERNO comprehensively infers the molecular mechanisms of causal noncoding variants, providing a sensitive hypothesis generation method for post-GWAS analysis. The software is available as an open source pipeline and a web server.