Blog

Motivation: Annotation of genomic variants is an increasingly important and complex part of the analysis of sequence-based genomic analyses. Computational predictions of variant function are routinely incorporated into gene-based analyses of rare-variants, though to date most studies use limited information for assessing variant function that is often agnostic of the disease being studied.
Results: In this work, we outline an annotation process motivated by the Alzheimer’s Disease Sequencing Project, illustrate the impact of including tissue-specific transcript sets and sources of gene regulatory information and assess the potential impact of changing genomic builds on the annotation process. While these factors only impact a small proportion of total variant annotations (∼5%), they influence the potential analysis of a large fraction of genes (∼25%).
Availability and implementation: Individual variant annotations are available via the NIAGADS GenomicsDB, at https://www.niagads.org/genomics/ tools-and-software/databases/genomics-database. Annotations are also available for bulk download at https://www.niagads.org/datasets. Annotation processing software is available at http://www.icompbio.net/resources/software-and-downloads/.
Supplementary information: Supplementary data are available at Bioinformatics online.

Motivation: Copy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous single-nucleotide polymorphism (SNP)-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads.
Results: We propose a statistical framework, integrated CNV (iCNV) detection algorithm, which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform-specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a hidden Markov model. We compare integrated two-platform CNV detection using iCNV to naïve intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods.
Availability and implementation: https://github.com/zhouzilu/iCNV.
Supplementary information: Supplementary data are available at Bioinformatics online.

BACKGROUND: Progressive supranuclear palsy (PSP) is a parkinsonian neurodegenerative tauopathy affecting brain regions involved in motor function, including the basal ganglia, diencephalon and brainstem. While PSP is largely considered to be a sporadic disorder, cases with suspected familial inheritance have been identified and the common MAPT H1haplotype is a major genetic risk factor. Due to the relatively low prevalence of PSP, large sample sizes can be difficult to achieve, and this has limited the ability to detect true genetic risk factors at the genome-wide statistical threshold for significance in GWAS data. With this in mind, in this study we genotyped the genetic variants that displayed the strongest degree of association with PSP (P<1E-4) in the previous GWAS in a new cohort of 533 pathologically-confirmed PSP cases and 1172 controls, and performed a combined analysis with the previous GWAS data.
RESULTS: Our findings validate the known association of loci at MAPT, MOBP, EIF2AK3 and STX6 with risk of PSP, and uncover novel associations with SLCO1A2 (rs11568563) and DUSP10 (rs6687758) variants, both of which were classified as non-significant in the original GWAS.
CONCLUSIONS: Resolving the genetic architecture of PSP will provide mechanistic insights and nominate candidate genes and pathways for future therapeutic intervention strategies.

The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.

Aging is the strongest risk factor for Alzheimer’s disease (AD), although the underlying mechanisms remain unclear. The chromatin state, in particular through the mark H4K16ac, has been implicated in aging and thus may play a pivotal role in age-associated neurodegeneration. Here we compare the genome-wide enrichment of H4K16ac in the lateral temporal lobe of AD individuals against both younger and elderly cognitively normal controls. We found that while normal aging leads to H4K16ac enrichment, AD entails dramatic losses of H4K16ac in the proximity of genes linked to aging and AD. Our analysis highlights the presence of three classes of AD-related changes with distinctive functional roles. Furthermore, we discovered an association between the genomic locations of significant H4K16ac changes with genetic variants identified in prior AD genome-wide association studies and with expression quantitative trait loci. Our results establish the basis for an epigenetic link between aging and AD.

Objective: To identify rare causal variants underlying known loci that segregate with late-onset Alzheimer’s disease (LOAD) in multiplex families.
Methods: We analyzed whole genome sequences (WGS) from 351 members of 67 Caribbean Hispanic (CH) families from Dominican Republic and New York multiply affected by LOAD. Members of 67 CH and additional 47 Caucasian families underwent WGS as a part of the Alzheimer’s Disease Sequencing Project (ADSP). All members of 67 CH families, an additional 48 CH families and an independent CH case-control cohort were subsequently genotyped for validation. Patients met criteria for LOAD, and controls were determined to be dementia free. We investigated rare variants segregating within families and gene-based associations with disease within LOAD GWAS loci.
Results: A variant in AKAP9, p.R434W, segregated significantly with LOAD in two large families (OR = 5.77, 95% CI: 1.07-30.9, P = 0.041). In addition, missense mutations in MYRF and ASRGL1 under previously reported linkage peaks at 7q14.3 and 11q12.3 segregated completely in one family and in follow-up genotyping both were nominally significant (P < 0.05). We also identified rare variants in a number of genes associated with LOAD in prior genome wide association studies, including CR1 (P = 0.049), BIN1 (P = 0.0098) and SLC24A4 (P = 0.040).
Conclusions and Relevance: Rare variants in multiple genes influence the risk of LOAD disease in multiplex families. These results suggest that rare variants may underlie loci identified in genome wide association studies.

BACKGROUND/AIMS: The Alzheimer’s Disease Sequencing Project (ADSP) aims to identify novel genes influencing Alzheimer’s disease (AD). Variants within genes known to cause dementias other than AD have previously been associated with AD risk. We describe evidence of co-segregation and associations between variants in dementia genes and clinically diagnosed AD within the ADSP.
METHODS: We summarize the properties of known pathogenic variants within dementia genes, describe the co-segregation of variants annotated as “pathogenic” in ClinVar and new candidates observed in ADSP families, and test for associations between rare variants in dementia genes in the ADSP case-control study. The participants were clinically evaluated for AD, and they represent European, Caribbean Hispanic, and isolate Dutch populations.
RESULTS/CONCLUSIONS: Pathogenic variants in dementia genes were predominantly rare and conserved coding changes. Pathogenic variants within ARSA, CSF1R, and GRN were observed, and candidate variants in GRN and CHMP2B were nominated in ADSP families. An independent case-control study provided evidence of an association between variants in TREM2, APOE, ARSA, CSF1R, PSEN1, and MAPT and risk of AD. Variants in genes which cause dementing disorders may influence the clinical diagnosis of AD in a small proportion of cases within the ADSP.

Transcriptional enhancers regulate spatio-temporal gene expression. While genomic assays can identify putative enhancers en masse, assigning target genes is a complex challenge. We devised a machine learning approach, McEnhancer, which links target genes to putative enhancers via a semi-supervised learning algorithm that predicts gene expression patterns based on enriched sequence features. Predicted expression patterns were 73-98% accurate, predicted assignments showed strong Hi-C interaction enrichment, enhancer-associated histone modifications were evident, and known functional motifs were recovered. Our model provides a general framework to link globally identified enhancers to targets and contributes to deciphering the regulatory genome.

Importance: It is unclear whether female carriers of the apolipoprotein E (APOE) ε4 allele are at greater risk of developing Alzheimer disease (AD) than men, and the sex-dependent association of mild cognitive impairment (MCI) and APOE has not been established.
Objective: To determine how sex and APOE genotype affect the risks for developing MCI and AD.
Data Sources: Twenty-seven independent research studies in the Global Alzheimer’s Association Interactive Network with data on nearly 58 000 participants.
Study Selection: Non-Hispanic white individuals with clinical diagnostic and APOE genotype data.
Data Extraction and Synthesis: Homogeneous data sets were pooled in case-control analyses, and logistic regression models were used to compute risks.
Main Outcomes and Measures: Age-adjusted odds ratios (ORs) and 95% confidence intervals for developing MCI and AD were calculated for men and women across APOE genotypes.
Results: Participants were men and women between ages 55 and 85 years. Across data sets most participants were white, and for many participants, racial/ethnic information was either not collected or not known. Men (OR, 3.09; 95% CI, 2.79-3.42) and women (OR, 3.31; CI, 3.03-3.61) with the APOE ε3/ε4 genotype from ages 55 to 85 years did not show a difference in AD risk; however, women had an increased risk compared with men between the ages of 65 and 75 years (women, OR, 4.37; 95% CI, 3.82-5.00; men, OR, 3.14; 95% CI, 2.68-3.67; P = .002). Men with APOE ε3/ε4 had an increased risk of AD compared with men with APOE ε3/ε3. The APOE ε2/ε3 genotype conferred a protective effect on women (OR, 0.51; 95% CI, 0.43-0.61) decreasing their risk of AD more (P value = .01) than men (OR, 0.71; 95% CI, 0.60-0.85). There was no difference between men with APOE ε3/ε4 (OR, 1.55; 95% CI, 1.36-1.76) and women (OR, 1.60; 95% CI, 1.43-1.81) in their risk of developing MCI between the ages of 55 and 85 years, but women had an increased risk between 55 and 70 years (women, OR, 1.43; 95% CI, 1.19-1.73; men, OR, 1.07; 95% CI, 0.87-1.30; P = .05). There were no significant differences between men and women in their risks for converting from MCI to AD between the ages of 55 and 85 years. Individuals with APOE ε4/ε4 showed increased risks vs individuals with ε3/ε4, but no significant differences between men and women with ε4/ε4 were seen.
Conclusions and Relevance: Contrary to long-standing views, men and women with the APOE ε3/ε4 genotype have nearly the same odds of developing AD from age 55 to 85 years, but women have an increased risk at younger ages.

Importance: Mutations in APP, PSEN1, and PSEN2 lead to early-onset Alzheimer disease (EOAD) but account for only approximately 11% of EOAD overall, leaving most of the genetic risk for the most severe form of Alzheimer disease unexplained. This extreme phenotype likely harbors highly penetrant risk variants, making it primed for discovery of novel risk genes and pathways for AD.
Objective: To search for rare variants contributing to the risk for EOAD.
Design, Setting, and Participants: In this case-control study, whole-exome sequencing (WES) was performed in 51 non-Hispanic white (NHW) patients with EOAD (age at onset 65 years) from the Alzheimer’s Disease Genetics Consortium. The study was conducted from January 21, 2013, to October 13, 2016.
Main Outcomes and Measures: Alzheimer disease diagnosed according to standard National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer Disease and Related Disorders Association criteria. Association between Alzheimer disease and genetic variants and genes was measured using logistic regression and sequence kernel association test-optimal gene tests, respectively.
Results: Of the 1524 NHW patients with EOAD, 765 (50.2%) were women and mean (SD) age was 60.0 (4.9) years; of the 7046 NHW patients with LOAD, 4171 (59.2%) were women and mean (SD) age was 77.4 (8.6) years; and of the 7001 NHW controls, 4215 (60.2%) were women and mean (SD) age was 77.4 (8.6) years. The gene PSD2, for which multiple unrelated NHW cases had rare missense variants, was significantly associated with EOAD (P = 2.05 × 10-6; Bonferroni-corrected P value [BP] = 1.3 × 10-3) and LOAD (P = 6.22 × 10-6; BP = 4.1 × 10-3). A missense variant in TCIRG1, present in a NHW patient and segregating in 3 cases of a Hispanic family, was more frequent in EOAD cases (odds ratio [OR], 2.13; 95% CI, 0.99-4.55; P = .06; BP = 0.413), and significantly associated with LOAD (OR, 2.23; 95% CI, 1.37-3.62; P = 7.2 × 10-4; BP = 5.0 × 10-3). A missense variant in the LOAD risk gene RIN3 showed suggestive evidence of association with EOAD after Bonferroni correction (OR, 4.56; 95% CI, 1.26-16.48; P = .02, BP = 0.091). In addition, a missense variant in RUFY1 identified in 2 NHW EOAD cases showed suggestive evidence of an association with EOAD as well (OR, 18.63; 95% CI, 1.62-213.45; P = .003; BP = 0.129).
Conclusions and Relevance: The genes PSD2, TCIRG1, RIN3, and RUFY1 all may be involved in endolysosomal transport-a process known to be important to development of AD. Furthermore, this study identified shared risk genes between EOAD and LOAD similar to previously reported genes, such as SORL1, PSEN2, and TREM2.