Blog

B cells are subjected to selection at multiple checkpoints during their development. The selection of Ab H chains is difficult to study because of the large diversity of the CDR3. To study the selection of individual Ab H chain V region genes (V(H)), we performed CDR3 spectratyping of ∼ 75-300 rearrangements per individual V(H) in C57BL6/J mice. We measured the fraction of rearrangements that were in-frame in B cell DNA. We demonstrate that individual V(H)s have different fractions of in-frame rearrangements (IF fractions) ranging from 10 to 90% and that these IF fractions are reproducible in different mice. For most V(H)s, the IF fraction in pro-B cells approximated 33% and then shifted to the nearly final (mature) B cell value by the cycling pre-B cell stage. The frequency of high in-frame (IF) V(H) usage increased in cycling pre-B cells compared with that in pro-B cells, whereas this did not occur for low IF V(H)s. The IF fraction did not shift as much in BCR-expressing B cells and was minimally affected by L chain usage for most V(H). High IF clan II/III V(H)s share more positively charged CDR2 sequences, whereas high IF clan I J558 CDR2 sequences are diverse. These data indicate that individual V(H)s are subjected to differential selection, that V(H) IF fraction is mainly established through pre-BCR-mediated selection, that it may operate differently in clan I versus II/III V(H)s, and that it has a lasting influence on the Ab repertoire.

Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.

Progressive supranuclear palsy (PSP) is a movement disorder with prominent tau neuropathology. Brain diseases with abnormal tau deposits are called tauopathies, the most common of which is Alzheimer’s disease. Environmental causes of tauopathies include repetitive head trauma associated with some sports. To identify common genetic variation contributing to risk for tauopathies, we carried out a genome-wide association study of 1,114 individuals with PSP (cases) and 3,247 controls (stage 1) followed by a second stage in which we genotyped 1,051 cases and 3,560 controls for the stage 1 SNPs that yielded P ≤ 10(-3). We found significant previously unidentified signals (P < 5 × 10(-8)) associated with PSP risk at STX6, EIF2AK3 and MOBP. We confirmed two independent variants in MAPT affecting risk for PSP, one of which influences MAPT brain expression. The genes implicated encode proteins for vesicle-membrane fusion at the Golgi-endosomal interface, for the endoplasmic reticulum unfolded protein response and for a myelin structural component.

The Alzheimer Disease Genetics Consortium (ADGC) performed a genome-wide association study of late-onset Alzheimer disease using a three-stage design consisting of a discovery stage (stage 1) and two replication stages (stages 2 and 3). Both joint analysis and meta-analysis approaches were used. We obtained genome-wide significant results at MS4A4A (rs4938933; stages 1 and 2, meta-analysis P (P(M)) = 1.7 × 10(-9), joint analysis P (P(J)) = 1.7 × 10(-9); stages 1, 2 and 3, P(M) = 8.2 × 10(-12)), CD2AP (rs9349407; stages 1, 2 and 3, P(M) = 8.6 × 10(-9)), EPHA1 (rs11767557; stages 1, 2 and 3, P(M) = 6.0 × 10(-10)) and CD33 (rs3865444; stages 1, 2 and 3, P(M) = 1.6 × 10(-9)). We also replicated previous associations at CR1 (rs6701713; P(M) = 4.6 × 10(-10), P(J) = 5.2 × 10(-11)), CLU (rs1532278; P(M) = 8.3 × 10(-8), P(J) = 1.9 × 10(-8)), BIN1 (rs7561528; P(M) = 4.0 × 10(-14), P(J) = 5.2 × 10(-14)) and PICALM (rs561655; P(M) = 7.0 × 10(-11), P(J) = 1.0 × 10(-10)), but not at EXOC3L2, to late-onset Alzheimer’s disease susceptibility.

OBJECTIVES: To determine whether genotypes at CLU, PICALM, and CR1 confer risk for Alzheimer disease (AD) and whether risk for AD associated with these genes is influenced by apolipoprotein E (APOE) genotypes.
DESIGN: Association study of AD and CLU, PICALM, CR1, and APOE genotypes.
SETTING: Academic research institutions in the United States, Canada, and Israel.
PARTICIPANTS: Seven thousand seventy cases with AD, 3055 with autopsies, and 8169 elderly cognitively normal controls, 1092 with autopsies, from 12 different studies, including white, African American, Israeli-Arab, and Caribbean Hispanic individuals.
RESULTS: Unadjusted, CLU (odds ratio [OR], 0.91; 95% confidence interval [CI], 0.85-0.96 for single-nucleotide polymorphism [SNP] rs11136000), CR1 (OR, 1.14; 95% CI, 1.07-1.22; SNP rs3818361), and PICALM (OR, 0.89; 95% CI, 0.84-0.94, SNP rs3851179) were associated with AD in white individuals. None were significantly associated with AD in the other ethnic groups. APOE ε4 was significantly associated with AD (ORs, 1.80-9.05) in all but 1 small white cohort and in the Arab cohort. Adjusting for age, sex, and the presence of at least 1 APOE ε4 allele greatly reduced evidence for association with PICALM but not CR1 or CLU. Models with the main SNP effect, presence or absence of APOE ε4, and an interaction term showed significant interaction between presence or absence of APOE ε4 and PICALM.
CONCLUSIONS: We confirm in a completely independent data set that CR1, CLU, and PICALM are AD susceptibility loci in European ancestry populations. Genotypes at PICALM confer risk predominantly in APOE ε4-positive subjects. Thus, APOE and PICALM synergistically interact.

BACKGROUND: Alzheimer’s disease (AD) is common and highly heritable with many genes and gene variants associated with AD in one or more studies, including APOE ε2/ε3/ε4. However, the genetic backgrounds for normal cognition, mild cognitive impairment (MCI) and AD in terms of changes in cerebrospinal fluid (CSF) levels of Aβ1-42, T-tau, and P-tau181P, have not been clearly delineated. We carried out a genome-wide association study (GWAS) in order to better define the genetic backgrounds to these three states in relation to CSF levels.
METHODS: Subjects were participants in the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The GWAS dataset consisted of 818 participants (mainly Caucasian) genotyped using the Illumina Human Genome 610 Quad BeadChips. This sample included 410 subjects (119 Normal, 115 MCI and 176 AD) with measurements of CSF Aβ1-42, T-tau, and P-tau181P Levels. We used PLINK to find genetic associations with the three CSF biomarker levels. Association of each of the 498,205 SNPs was tested using additive, dominant, and general association models while considering APOE genotype and age. Finally, an effort was made to better identify relevant biochemical pathways for associated genes using the ALIGATOR software.
RESULTS: We found that there were some associations with APOE genotype although CSF levels were about the same for each subject group; CSF Aβ1-42 levels decreased with APOE gene dose for each subject group. T-tau levels tended to be higher among AD cases than among normal subjects. From adjusted result using APOE genotype and age as covariates, no SNP was associated with CSF levels among AD subjects. CYP19A1 ‘aromatase’ (rs2899472), NCAM2, and multiple SNPs located on chromosome 10 near the ARL5B gene demonstrated the strongest associations with Aβ1-42 in normal subjects. Two genes found to be near the top SNPs, CYP19A1 (rs2899472, p = 1.90 × 10(-7)) and NCAM2 (rs1022442, p = 2.75 × 10(-7)) have been reported as genetic factors related to the progression of AD from previous studies. In AD subjects, APOE ε2/ε3 and ε2/ε4 genotypes were associated with elevated T-tau levels and ε4/ε4 genotype was associated with elevated T-tau and P-tau181P levels. Pathway analysis detected several biological pathways implicated in Normal with CSF β-amyloid peptide (Aβ1-42).
CONCLUSIONS: Our genome-wide association analysis identified several SNPs as important factors for CSF biomarker. We also provide new evidence for additional candidate genetic risk factors from pathway analysis that can be tested in further studies.

The functional structure of all biologically active molecules is dependent on intra- and inter-molecular interactions. This is especially evident for RNA molecules whose functionality, maturation, and regulation require formation of correct secondary structure through encoded base-pairing interactions. Unfortunately, intra- and inter-molecular base-pairing information is lacking for most RNAs. Here, we marry classical nuclease-based structure mapping techniques with high-throughput sequencing technology to interrogate all base-paired RNA in Arabidopsis thaliana and identify ∼200 new small (sm)RNA-producing substrates of RNA-DEPENDENT RNA POLYMERASE6. Our comprehensive analysis of paired RNAs reveals conserved functionality within introns and both 5′ and 3′ untranslated regions (UTRs) of mRNAs, as well as a novel population of functional RNAs, many of which are the precursors of smRNAs. Finally, we identify intra-molecular base-pairing interactions to produce a genome-wide collection of RNA secondary structure models. Although our methodology reveals the pairing status of RNA molecules in the absence of cellular proteins, previous studies have demonstrated that structural information obtained for RNAs in solution accurately reflects their structure in ribonucleoprotein complexes. Furthermore, our identification of RNA-DEPENDENT RNA POLYMERASE6 substrates and conserved functional RNA domains within introns and both 5′ and 3′ untranslated regions (UTRs) of mRNAs using this approach strongly suggests that RNA molecules are correctly folded into their secondary structure in solution. Overall, our findings highlight the importance of base-paired RNAs in eukaryotes and present an approach that should be widely applicable for the analysis of this key structural feature of RNA.

BACKGROUND: Human brain aging has received special attention in part because of the elevated risks of neurodegenerative disorders such as Alzheimer’s disease in seniors. Recent technological advances enable us to investigate whether similar mechanisms underlie aging and neurodegeneration, by quantifying the similarities and differences in their genome-wide gene expression profiles.
PRINCIPAL FINDINGS: We have developed a computational method for assessing an individual’s “physiological brain age” by comparing global mRNA expression datasets across a range of normal human brain samples. Application of this method to brains samples from select regions in two diseases–Alzheimer’s disease (AD, superior frontal gyrus), frontotemporal lobar degeneration (FTLD, in rostral aspect of frontal cortex ∼BA10)–showed that while control cohorts exhibited no significant difference between physiological and chronological ages, FTLD and AD exhibited prematurely aged expression profiles.
CONCLUSIONS: This study establishes a quantitative scale for measuring premature aging in neurodegenerative disease cohorts, and it identifies specific physiological mechanisms common to aging and some forms of neurodegeneration. In addition, accelerated expression profiles associated with AD and FTLD suggest some common mechanisms underlying the risk of developing these diseases.

BACKGROUND: Genome-wide studies on autism spectrum disorders (ASDs) have mostly focused on large-scale population samples, but examination of rare variations in isolated populations may provide additional insights into the disease pathogenesis.
METHODS: As a first step in the genetic analysis of ASD in Croatia, we characterized genetic variation in a sample of 103 subjects with ASD and 203 control individuals, who were genotyped using the Illumina HumanHap550 BeadChip. We analyzed the genetic diversity of the Croatian population and its relationship to other populations, the degree of relatedness via Runs of Homozygosity (ROHs), and the distribution of large (>500 Kb) copy number variations.
RESULTS: Combining the Croatian cohort with several previously published populations in the FastME analysis (an alternative to Neighbor Joining) revealed that Croatian subjects cluster, as expected, with Southern Europeans; in addition, individuals from the same geographic region within Europe cluster together. Whereas Croatian subjects could be separated from a sample of healthy control subjects of European origin from North America, Croatian ASD cases and controls are well mixed. A comparison of runs of homozygosity indicated that the number and the median length of regions of homozygosity are higher for ASD subjects than for controls (p = 6 × 10(-3)). Furthermore, analysis of copy number variants found a higher frequency of large chromosomal rearrangements (>2 Mb) in ASD cases (5/103) than in ethnically matched control subjects (1/197, p = 0.019).
CONCLUSIONS: Our findings illustrate the remarkable utility of high-density genotype data for subjects from a limited geographic area in dissecting genetic heterogeneity with respect to population and disease related variation.

MOTIVATION: The rapid development of genotyping technology and extensive cataloguing of single nucleotide polymorphisms (SNPs) across the human genome have made genetic association studies the mainstream for gene mapping of complex human diseases. For many diseases, the most practical approach is the population-based design with unrelated individuals. Although having the advantages of easier sample collection and greater power than family-based designs, unrecognized population stratification in the study samples can lead to both false-positive and false-negative findings and might obscure the true association signals if not appropriately corrected.
METHODS: We report PHYLOSTRAT, a new method that corrects for population stratification by combining phylogeny constructed from SNP genotypes and principal coordinates from multi-dimensional scaling (MDS) analysis. This hybrid approach efficiently captures both discrete and admixed population structures.
RESULTS: By extensive simulations, the analysis of a synthetic genome-wide association dataset created using data from the Human Genome Diversity Project, and the analysis of a lactase-height dataset, we show that our method can correct for population stratification more efficiently than several existing population stratification correction methods, including EIGENSTRAT, a hybrid approach based on MDS and clustering, and STRATSCORE , in terms of requiring fewer random SNPs for inference of population structure. By combining the flexibility and hierarchical nature of phylogenetic trees with the advantage of representing admixture using MDS, our hybrid approach can capture the complex population structures in human populations effectively.
SOFTWARE AVAILABILITY: Codes can be downloaded from http://people.pcbi.upenn.edu/ approximately lswang/phylostrat/
CONTACT: mingyao@upenn.edu; iswang@upenn.edu.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.