Blog

The human Werner and Bloom syndromes (WS and BS) are caused by deficiencies in the WRN and BLM RecQ helicases, respectively. WRN, BLM and their Saccharomyces cerevisiae homologue Sgs1, are particularly active in vitro in unwinding G-quadruplex DNA (G4-DNA), a family of non-canonical nucleic acid structures formed by certain G-rich sequences. Recently, mRNA levels from loci containing potential G-quadruplex-forming sequences (PQS) were found to be preferentially altered in sgs1Delta mutants, suggesting that G4-DNA targeting by Sgs1 directly affects gene expression. Here, we extend these findings to human cells. Using microarrays to measure mRNAs obtained from human fibroblasts deficient for various RecQ family helicases, we observe significant associations between loci that are upregulated in WS or BS cells and loci that have PQS. No such PQS associations were observed for control expression datasets, however. Furthermore, upregulated genes in WS and BS showed no or dramatically reduced associations with sequences similar to PQS but that have considerably reduced potential to form intramolecular G4-DNA. These findings indicate that, like Sgs1, WRN and BLM can regulate transcription globally by targeting G4-DNA.

The problem of Alzheimer’s disease (AD) exemplifies the challenges of dealing with a broad range of aging-related chronic disorders that require long-term, labor-intensive, and expensive care. As the baby boom generation ages and brain diseases become more prevalent, the need to confront the pending health care crisis is more urgent than ever before. Indeed, there is now a critical need to expand significantly the national effort to solve the problem of AD, with special focus on prevention. The Campaign to Prevent Alzheimer’s Disease by 2020 (PAD2020) aims to create a new paradigm for planning and supporting the organization of worldwide cooperative research networks to develop new technologies for early detection and treatments of aging-related memory and motor impairments. PAD 2020 is developing an implementation plan to justify (1) increasing the federal budget for research, (2) developing novel national resources to discover new interventions for memory and motor disorders, and (3) creating innovative and streamlined decision-making processes for selecting and supporting new ideas. Since 1978 the National Institute on Aging or National Institute of Health (NIH) established an extensive national network of AD research facilities at academic institutions including AD Centers (ADCs), Consortium to Establish a Registry for AD, AD Cooperative Study (ADCS), AD Drug Discovery Program, National Alzheimer’s Coordinating Center, National Cell Repository for AD, and AD Neuroimaging Initiative. However, despite the success of these programs and their critical contributions, they are no longer adequate to meet the challenges presented by AD. PAD 2020 is designed to address these changes by improving the efficiency and effectiveness of these programs. For example, the ADCs (P30s and P50s) can be enhanced by converting some into Comprehensive Alzheimer’s Disease Centers (CADCs) to support not only research, but also by being demonstration projects on care/treatment, clinical trials, and education as well as by seamlessly integrating multisite collaborative studies (ADCS, AD Neuroimaging Initiative, Patient Registries, Clinical Data Banks, etc) into a cohesive structure that further enhances the original mission of the National Institute on Aging ADCs. Regional CADCs offer greater efficiency and cost savings while serving as coordinating hubs of existing ADCs, thereby offering greater economies of scale and programmatic integration. The CADCs also broaden the scope of ADC activities to include research on interventions, diagnosis, imaging, prevention trials, and other longitudinal studies that require long-term support. Thus, CADCs can address the urgent need to identify subjects at high risk of AD for prevention trials and very early in the course of AD for clinical trials of disease modification. The enhanced CADCs will allow more flexibility among ADCs by supporting collaborative linkages with other institutions and drawing on a wider expertise from different locations. This perspective article describes the University of Pennsylvania (Penn) CADC Model as an illustrative example of how an existing ADC can be converted into a CADC by better utilization of Penn academic resources to address the wide range of problems concerning AD. The intent of this position paper is to stimulate thinking and foster the development of other or alternative models for a systematic approach to the study of dementia and movement disorders.

Frontotemporal lobar degeneration (FTLD) is the second most common cause of presenile dementia. The predominant neuropathology is FTLD with TAR DNA-binding protein (TDP-43) inclusions (FTLD-TDP). FTLD-TDP is frequently familial, resulting from mutations in GRN (which encodes progranulin). We assembled an international collaboration to identify susceptibility loci for FTLD-TDP through a genome-wide association study of 515 individuals with FTLD-TDP. We found that FTLD-TDP associates with multiple SNPs mapping to a single linkage disequilibrium block on 7p21 that contains TMEM106B. Three SNPs retained genome-wide significance following Bonferroni correction (top SNP rs1990622, P = 1.08 x 10(-11); odds ratio, minor allele (C) 0.61, 95% CI 0.53-0.71). The association replicated in 89 FTLD-TDP cases (rs1990622; P = 2 x 10(-4)). TMEM106B variants may confer risk of FTLD-TDP by increasing TMEM106B expression. TMEM106B variants also contribute to genetic risk for FTLD-TDP in individuals with mutations in GRN. Our data implicate variants in TMEM106B as a strong risk factor for FTLD-TDP, suggesting an underlying pathogenic mechanism.

Sequences with the potential to form intramolecular G-quadruplexes (G4-structures) are found in highly nonrandom distributions in the genomes of diverse organisms. These sequences are associated with nucleic acid metabolic processes ranging from transcription and translation to recombination and telomere function. Here we review different computational methods for identifying potential G4-forming sequences and provide protocols for their implementation. We also discuss methods for assessing the significance and specificity of associations between the sequences and different biological functions.

CD8 T cells, which have a crucial role in immunity to infection and cancer, are maintained in constant numbers, but on antigen stimulation undergo a developmental program characterized by distinct phases encompassing the expansion and then contraction of antigen-specific effector (T(E)) populations, followed by the persistence of long-lived memory (T(M)) cells. Although this predictable pattern of CD8 T-cell responses is well established, the underlying cellular mechanisms regulating the transition to T(M) cells remain undefined. Here we show that tumour necrosis factor (TNF) receptor-associated factor 6 (TRAF6), an adaptor protein in the TNF-receptor and interleukin-1R/Toll-like receptor superfamily, regulates CD8 T(M)-cell development after infection by modulating fatty acid metabolism. We show that mice with a T-cell-specific deletion of TRAF6 mount robust CD8 T(E)-cell responses, but have a profound defect in their ability to generate T(M) cells that is characterized by the disappearance of antigen-specific cells in the weeks after primary immunization. Microarray analyses revealed that TRAF6-deficient CD8 T cells exhibit altered expression of genes that regulate fatty acid metabolism. Consistent with this, activated CD8 T cells lacking TRAF6 display defective AMP-activated kinase activation and mitochondrial fatty acid oxidation (FAO) in response to growth factor withdrawal. Administration of the anti-diabetic drug metformin restored FAO and CD8 T(M)-cell generation in the absence of TRAF6. This treatment also increased CD8 T(M) cells in wild-type mice, and consequently was able to considerably improve the efficacy of an experimental anti-cancer vaccine.

BACKGROUND: Large-scale statistical analyses have become hallmarks of post-genomic era biological research due to advances in high-throughput assays and the integration of large biological databases. One accompanying issue is the simultaneous estimation of p-values for a large number of hypothesis tests. In many applications, a parametric assumption in the null distribution such as normality may be unreasonable, and resampling-based p-values are the preferred procedure for establishing statistical significance. Using resampling-based procedures for multiple testing is computationally intensive and typically requires large numbers of resamples.
RESULTS: We present a new approach to more efficiently assign resamples (such as bootstrap samples or permutations) within a nonparametric multiple testing framework. We formulated a Bayesian-inspired approach to this problem, and devised an algorithm that adapts the assignment of resamples iteratively with negligible space and running time overhead. In two experimental studies, a breast cancer microarray dataset and a genome wide association study dataset for Parkinson’s disease, we demonstrated that our differential allocation procedure is substantially more accurate compared to the traditional uniform resample allocation.
CONCLUSION: Our experiments demonstrate that using a more sophisticated allocation strategy can improve our inference for hypothesis testing without a drastic increase in the amount of computation on randomized data. Moreover, we gain more improvement in efficiency when the number of tests is large. R code for our algorithm and the shortcut method are available at http://people.pcbi.upenn.edu/~lswang/pub/bmc2009/.

Bipolar disorder (BPD) is a common psychiatric illness with a complex mode of inheritance. Besides traditional linkage and association studies, which require large sample sizes, analysis of common and rare chromosomal copy number variants (CNVs) in extended families may provide novel insights into the genetic susceptibility of complex disorders. Using the Illumina HumanHap550 BeadChip with over 550,000 SNP markers, we genotyped 46 individuals in a three-generation Old Order Amish pedigree with 19 affected (16 BPD and three major depression) and 27 unaffected subjects. Using the PennCNV algorithm, we identified 50 CNV regions that ranged in size from 12 to 885 kb and encompassed at least 10 single nucleotide polymorphisms (SNPs). Of 19 well characterized CNV regions that were available for combined genotype-expression analysis 11 (58%) were associated with expression changes of genes within, partially within or near these CNV regions in fibroblasts or lymphoblastoid cell lines at a nominal P value <0.05. To further investigate the mode of inheritance of CNVs in the large pedigree, we analyzed a set of four CNVs, located at 6q27, 9q21.11, 12p13.31 and 15q11, all of which were enriched in subjects with affective disorders. We additionally show that these variants affect the expression of neuronal genes within or near the rearrangement. Our analysis suggests that family based studies of the combined effect of common and rare CNVs at many loci may represent a useful approach in the genetic analysis of disease susceptibility of mental disorders.

Although well studied in vitro, the in vivo functions of G-quadruplexes (G4-DNA and G4-RNA) are only beginning to be defined. Recent studies have demonstrated enrichment for sequences with intramolecular G-quadruplex forming potential (QFP) in transcriptional promoters of humans, chickens and bacteria. Here we survey the yeast genome for QFP sequences and similarly find strong enrichment for these sequences in upstream promoter regions, as well as weaker but significant enrichment in open reading frames (ORFs). Further, four findings are consistent with roles for QFP sequences in transcriptional regulation. First, QFP is correlated with upstream promoter regions with low histone occupancy. Second, treatment of cells with N-methyl mesoporphyrin IX (NMM), which binds G-quadruplexes selectively in vitro, causes significant upregulation of loci with QFP-possessing promoters or ORFs. NMM also causes downregulation of loci connected with the function of the ribosomal DNA (rDNA), which itself has high QFP. Third, ORFs with QFP are selectively downregulated in sgs1 mutants that lack the G4-DNA-unwinding helicase Sgs1p. Fourth, a screen for yeast mutants that enhance or suppress growth inhibition by NMM revealed enrichment for chromatin and transcriptional regulators, as well as telomere maintenance factors. These findings raise the possibility that QFP sequences form bona fide G-quadruplexes in vivo and thus regulate transcription.

A transcriptional module (TM) is a collection of transcription factors (TF) that as a group, co-regulate multiple, functionally related genes. The task of identifying TMs poses an important biological challenge. Since TFs belong to evolutionarily and structurally related families, TF family members often bind to similar DNA motifs and can confound sequence-based approaches to TM identification. A previous approach to TM detection addresses this issue by pre-selecting a single representative from each TF family. One problem with this approach is that closely related transcription factors can still target sufficiently distinct genes in a biologically meaningful way, and thus, pre-selecting a single family representative may in principle miss certain TMs. Here we report a method-TREMOR (Transcriptional Regulatory Module Retriever). This method uses the Mahalanobis distance to assess the validity of a TM and automatically incorporates the inter-TF binding similarity without resorting to pre-selecting family representatives. The application of TREMOR on human muscle-specific, liver-specific and cell-cycle-related genes reveals TFs and TMs that were validated from literature and also reveals additional related genes.

Evolution operates on whole genomes through direct rearrangements of genes, such as inversions, transpositions, and inverted transpositions, as well as through operations, such as duplications, losses, and transfers, that also affect the gene content of the genomes. Because these events are rare relative to nucleotide substitutions, gene order data offer the possibility of resolving ancient branches in the tree of life; the combination of gene order data with sequence data also has the potential to provide more robust phylogenetic reconstructions, since each can elucidate evolution at different time scales. Distance corrections greatly improve the accuracy of phylogeny reconstructions from DNA sequences, enabling distance-based methods to approach the accuracy of the more elaborate methods based on parsimony or likelihood at a fraction of the computational cost. This paper focuses on developing distance correction methods for phylogeny reconstruction from whole genomes. The main question we investigate is how to estimate evolutionary histories from whole genomes with equal gene content, and we present a technique, the empirically derived estimator (EDE), that we have developed for this purpose. We study the use of EDE on whole genomes with identical gene content, and we explore the accuracy of phylogenies inferred using EDE with the neighbor joining and minimum evolution methods under a wide range of model conditions. Our study shows that tree reconstruction under these two methods is much more accurate when based on EDE distances than when based on other distances previously suggested for whole genomes.