Blog

MOTIVATION: Small non-coding RNAs (sncRNAs, 800 curated experiments from ENCODE and GEO/SRA across multiple RNA-seq protocols for both GRCh38/hg38 and GRCh37/hg19 assemblies are integrated in DASHR. Moreover, DASHR is the first to contain both known and novel, previously un-annotated sncRNA loci identified by unsupervised segmentation (13 times more loci with 1 678 800 total). Additionally, DASHR v2.0 adds >3 200 000 annotations for non-small RNA genes and other genomic features (long-noncoding RNAs, mRNAs, promoters, repeats). Furthermore, DASHR v2.0 introduces an enhanced user interface, interactive experiment-by-locus table view, sncRNA locus sorting and filtering by biological features. All annotation and expression information directly downloadable and accessible as UCSC genome browser tracks.
AVAILABILITY AND IMPLEMENTATION: DASHR v2.0 is freely available at https://lisanwanglab.org/DASHRv2.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

The poor outcomes in infant acute lymphoblastic leukemia (ALL) necessitate new treatments. Here we discover that EIF4E protein is elevated in most cases of infant ALL and test EIF4E targeting by the repurposed antiviral agent ribavirin, which has anticancer properties through EIF4E inhibition, as a potential treatment. We find that ribavirin treatment of actively dividing infant ALL cells on bone marrow stromal cells (BMSCs) at clinically achievable concentrations causes robust proliferation inhibition in proportion with EIF4E expression. Further, we find that ribavirin treatment of KMT2A-rearranged (KMT2A-R) infant ALL cells and the KMT2A-AFF1 cell line RS4:11 inhibits EIF4E, leading to decreases in oncogenic EIF4E-regulated cell growth and survival proteins. In ribavirin-sensitive KMT2A-R infant ALL cells and RS4:11 cells, EIF4E-regulated proteins with reduced levels of expression following ribavirin treatment include MYC, MCL1, NBN, BCL2 and BIRC5. Ribavirin-treated RS4:11 cells exhibit impaired EIF4E-dependent nuclear to cytoplasmic export and/or translation of the corresponding mRNAs, as well as reduced phosphorylation of the p-AKT1, p-EIF4EBP1, p-RPS6 and p-EIF4E signaling proteins. This leads to an S-phase cell cycle arrest in RS4:11 cells corresponding to the decreased proliferation. Ribavirin causes nuclear EIF4E to re-localize to the cytoplasm in KMT2A-AFF1 infant ALL and RS4:11 cells, providing further evidence for EIF4E inhibition. Ribavirin slows increases in peripheral blasts in KMT2A-R infant ALL xenograft-bearing mice. Ribavirin cooperates with chemotherapy, particularly L-asparaginase, in reducing live KMT2A-AFF1 infant ALL cells in BMSC co-cultures. This work establishes that EIF4E is broadly elevated across infant ALL and that clinically relevant ribavirin exposures have preclinical activity and effectively inhibit EIF4E in KMT2A-R cases, suggesting promise in EIF4E targeting using ribavirin as a means of treatment.

Risk for late-onset Alzheimer’s disease (LOAD), the most prevalent dementia, is partially driven by genetics. To identify LOAD risk loci, we performed a large genome-wide association meta-analysis of clinically diagnosed LOAD (94,437 individuals). We confirm 20 previous LOAD risk loci and identify five new genome-wide loci (IQCK, ACE, ADAM10, ADAMTS1, and WWOX), two of which (ADAM10, ACE) were identified in a recent genome-wide association (GWAS)-by-familial-proxy of Alzheimer’s or dementia. Fine-mapping of the human leukocyte antigen (HLA) region confirms the neurological and immune-mediated disease haplotype HLA-DR15 as a risk factor for LOAD. Pathway analysis implicates immunity, lipid metabolism, tau binding proteins, and amyloid precursor protein (APP) metabolism, showing that genetic variants affecting APP and Aβ processing are associated not only with early-onset autosomal dominant Alzheimer’s disease but also with LOAD. Analyses of risk genes and pathways show enrichment for rare variants (P = 1.32 × 10-7), indicating that additional rare variants remain to be identified. We also identify important genetic correlations between LOAD and traits such as family history of dementia and education.

Most of the loci identified by genome-wide association studies (GWAS) for late-onset Alzheimer’s disease (LOAD) are in strong linkage disequilibrium (LD) with nearby variants all of which could be the actual functional variants, often in non-protein-coding regions and implicating underlying gene regulatory mechanisms. We set out to characterize the causal variants, regulatory mechanisms, tissue contexts, and target genes underlying these associations. We applied our INFERNO algorithm to the top 19 non-APOE loci from the IGAP GWAS study. INFERNO annotated all LD-expanded variants at each locus with tissue-specific regulatory activity. Bayesian co-localization analysis of summary statistics and eQTL data was performed to identify tissue-specific target genes. INFERNO identified enhancer dysregulation in all 19 tag regions analyzed, significant enrichments of enhancer overlaps in the immune-related blood category, and co-localized eQTL signals overlapping enhancers from the matching tissue class in ten regions (ABCA7, BIN1, CASS4, CD2AP, CD33, CELF1, CLU, EPHA1, FERMT2, ZCWPW1). In several cases, we identified dysregulation of long noncoding RNA (lncRNA) transcripts and applied the lncRNA target identification algorithm from INFERNO to characterize their downstream biological effects. We also validated the allele-specific effects of several variants on enhancer function using luciferase expression assays. By integrating functional genomics with GWAS signals, our analysis yielded insights into the regulatory mechanisms, tissue contexts, genes, and biological processes affected by noncoding genetic variation associated with LOAD risk.

The majority of variants identified by genome-wide association studies (GWAS) reside in the noncoding genome, affecting regulatory elements including transcriptional enhancers. However, characterizing their effects requires the integration of GWAS results with context-specific regulatory activity and linkage disequilibrium annotations to identify causal variants underlying noncoding association signals and the regulatory elements, tissue contexts, and target genes they affect. We propose INFERNO, a novel method which integrates hundreds of functional genomics datasets spanning enhancer activity, transcription factor binding sites, and expression quantitative trait loci with GWAS summary statistics. INFERNO includes novel statistical methods to quantify empirical enrichments of tissue-specific enhancer overlap and to identify co-regulatory networks of dysregulated long noncoding RNAs (lncRNAs). We applied INFERNO to two large GWAS studies. For schizophrenia (36,989 cases, 113,075 controls), INFERNO identified putatively causal variants affecting brain enhancers for known schizophrenia-related genes. For inflammatory bowel disease (IBD) (12,882 cases, 21,770 controls), INFERNO found enrichments of immune and digestive enhancers and lncRNAs involved in regulation of the adaptive immune response. In summary, INFERNO comprehensively infers the molecular mechanisms of causal noncoding variants, providing a sensitive hypothesis generation method for post-GWAS analysis. The software is available as an open source pipeline and a web server.

Motivation: Annotation of genomic variants is an increasingly important and complex part of the analysis of sequence-based genomic analyses. Computational predictions of variant function are routinely incorporated into gene-based analyses of rare-variants, though to date most studies use limited information for assessing variant function that is often agnostic of the disease being studied.
Results: In this work, we outline an annotation process motivated by the Alzheimer’s Disease Sequencing Project, illustrate the impact of including tissue-specific transcript sets and sources of gene regulatory information and assess the potential impact of changing genomic builds on the annotation process. While these factors only impact a small proportion of total variant annotations (∼5%), they influence the potential analysis of a large fraction of genes (∼25%).
Availability and implementation: Individual variant annotations are available via the NIAGADS GenomicsDB, at https://www.niagads.org/genomics/ tools-and-software/databases/genomics-database. Annotations are also available for bulk download at https://www.niagads.org/datasets. Annotation processing software is available at http://www.icompbio.net/resources/software-and-downloads/.
Supplementary information: Supplementary data are available at Bioinformatics online.

Motivation: Copy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous single-nucleotide polymorphism (SNP)-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads.
Results: We propose a statistical framework, integrated CNV (iCNV) detection algorithm, which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform-specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a hidden Markov model. We compare integrated two-platform CNV detection using iCNV to naïve intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods.
Availability and implementation: https://github.com/zhouzilu/iCNV.
Supplementary information: Supplementary data are available at Bioinformatics online.

BACKGROUND: Progressive supranuclear palsy (PSP) is a parkinsonian neurodegenerative tauopathy affecting brain regions involved in motor function, including the basal ganglia, diencephalon and brainstem. While PSP is largely considered to be a sporadic disorder, cases with suspected familial inheritance have been identified and the common MAPT H1haplotype is a major genetic risk factor. Due to the relatively low prevalence of PSP, large sample sizes can be difficult to achieve, and this has limited the ability to detect true genetic risk factors at the genome-wide statistical threshold for significance in GWAS data. With this in mind, in this study we genotyped the genetic variants that displayed the strongest degree of association with PSP (P<1E-4) in the previous GWAS in a new cohort of 533 pathologically-confirmed PSP cases and 1172 controls, and performed a combined analysis with the previous GWAS data.
RESULTS: Our findings validate the known association of loci at MAPT, MOBP, EIF2AK3 and STX6 with risk of PSP, and uncover novel associations with SLCO1A2 (rs11568563) and DUSP10 (rs6687758) variants, both of which were classified as non-significant in the original GWAS.
CONCLUSIONS: Resolving the genetic architecture of PSP will provide mechanistic insights and nominate candidate genes and pathways for future therapeutic intervention strategies.

The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.

Aging is the strongest risk factor for Alzheimer’s disease (AD), although the underlying mechanisms remain unclear. The chromatin state, in particular through the mark H4K16ac, has been implicated in aging and thus may play a pivotal role in age-associated neurodegeneration. Here we compare the genome-wide enrichment of H4K16ac in the lateral temporal lobe of AD individuals against both younger and elderly cognitively normal controls. We found that while normal aging leads to H4K16ac enrichment, AD entails dramatic losses of H4K16ac in the proximity of genes linked to aging and AD. Our analysis highlights the presence of three classes of AD-related changes with distinctive functional roles. Furthermore, we discovered an association between the genomic locations of significant H4K16ac changes with genetic variants identified in prior AD genome-wide association studies and with expression quantitative trait loci. Our results establish the basis for an epigenetic link between aging and AD.