Date of Degree
MS (Master of Science)
Michael J. Schnieders
Genetic sequences are being collected at an ever increasing rate due to rapid cost reductions; however, experimental approaches to determine the structure and function of the protein(s) each gene codes are not keeping pace. Therefore, computational methods to augment experimental structures with comparative (i.e. homology) models using physics-based methods for building residues, loops and domains are needed to thread new sequences onto homologous structures. In addition, even experimental structure determination relies on analogous first principles structure refinement and prediction algorithms to place structural elements that are not defined by the data alone.
Computational methods developed to find the global free energy minimum of an amino acid sequence (i.e. the protein folding problem) are increasingly successful, but limitations in accuracy and efficiency remain. Optimization efforts have focused on subsets of systems and environments by utilizing potential energy functions ranging from fixed charged force fields (Fiser, Do, & Sali, 2000; Jacobson et al., 2004), statistical or knowledge based potentials (Das & Baker, 2008) and/or potentials incorporating experimental data (Brunger, 2007; Trabuco, Villa, Mitra, Frank, & Schulten, 2008).
Although these methods are widely used, limitations include 1) a target function global minimum that does not correspond to the actual free energy minimum and/or 2) search protocols that are inefficient or not deterministic due to rough energy landscapes characterized by large energy barriers between multiple minima.
Our Global Optimization Using Metadynamics and a Polarizable Force Field (GONDOLA) approach tackles the first limitation by incorporating experimental data (i.e. from X-ray crystallography, CryoEM or NMR experiments) into a hybrid target function that also includes information from a polarizable molecular mechanics force field (Lopes, Roux, & MacKerell, 2009; Ponder & Case, 2003). The second limitation is overcome by driving the sampling of conformational space by adding a time-dependent bias to the objective function, which pushes the search toward unexplored regions (Alessandro Barducci, Bonomi, & Parrinello, 2011; Zheng, Chen, & Yang, 2008).
The GONDOLA approach incorporates additional efficiency constructs for search space exploration that include Monte Carlo moves and fine grained minimization. Furthermore, the dimensionality of the search is reduced by fixing atomic coordinates of known structural regions while atoms of interest explore new coordinate positions. The overall approach can be used for optimization of side-chains (i.e. set side-chain atoms active while constraining backbone atoms), residues (i.e. side-chain atoms and backbone atoms active), ligand binding pose (i.e. set atoms along binding interface active), protein loops (i.e. set atoms connecting two terminating residues active) or even entire protein domains or complexes. Here we focus on using the GONDOLA general free energy driven optimization strategy to elucidate the structural details of missing protein loops, which are often missing from experimental structures due to conformational heterogeneity and/or limitations in the resolution of the data.
We first show that the correlation between experimental data and AMOEBA (i.e. a polarizable force field) structural minima is stronger than that for OPLS-AA (i.e. a fixed charge force field). This suggests that the higher order multipoles and polarization of the AMOEBA force field more accurately represented the true crystalline environment than the simpler OPLS-AA model. Thus, scoring and optimization of loops with AMOEBA is more accurate than with OPLS-AA, albeit at a slightly increased computational cost.
Next, missing PDZ domain protein loops and protein loops from a loop decoy data set were optimized for 5 ns using the GONDOLA approach (i.e. under the AMOEBA polarizable force field) as well as a commonly used global optimization procedure (i.e. simulated annealing under the OPLS-AA fixed charge force field). The GONDOLA procedure was shown to provide more accurate structures in terms of both experimental metrics (i.e. lower Rfree values) and structural metrics (i.e. using the MolProbity structure validation tool). In terms of Rfree, only one out of seven simulated annealing results was better than the Gondola global optimization. Similarly, one simulated anneal loop had a better MolProbity score, but none of the simulated annealing loops were better in both categories. On average, GONDOLA achieved an Rfree value 19.48 and simulated annealing saw an average Rfree value of 19.63, and the average MolProbity scores were 1.56 for GONDOLA and 1.75 for simulated annealing.
In addition to providing more accurate predictions, GONDOLA was shown to converge much faster than the simulated annealing protocol. Ten separate 5 ns optimizations of the 4 residue loop missing from one of the PDZ domains were conducted. Five were done using GONDOLA and five with the simulated annealing protocol. The fastest four converging results belonged to the GONDOLA approach. Thus, this work demonstrates that GONDOLA is well-suited to refine or predict the coordinates of missing residues and loops because it is both more accurate and converges more rapidly.
The human genome project sparked a revolution in the availability of low-cost genetic information and dramatically improved our understanding of human health and disease. Many approaches are being explored to assist clinical decision making in light of low-cost genetic information. For missense variants that have not been characterized biochemically, computational approaches capable of predicting the impact on protein structure, function and human phenotype are sorely needed. The starting point for such approaches are accurate protein structures for both wildtype and variant sequences. However, X-ray crystallography, a widely used method for the experimental determination of protein structure, is too time-consuming to be applied to all missense variants of clinical interest. Therefore, computational methods to augment experimental structures with comparative (i.e. homology) models based on physics-based methods for building missing residues, loops and domains of protein structures are imperative.
Here we propose an algorithm called GONDOLA to predict the atomic coordinates of protein residues, loops and domains. GONDOLA uses a state-of-the-art polarizable force field called AMOEBA to describe the interactions between atoms, and molecular dynamics with a time-dependent bias to drive efficient global optimization of the structure. The approach improves the quality of both experimental protein structures and those generated from homology modeling relative to existing methods. We demonstrate the power of GONDOLA by showing that it converges more rapidly than global optimization by simulated annealing, while also providing more accurate protein loops based on both experimental and physical criteria.
publicabstract, Inverse Kinematics, Loops, Metadynamics, Optimization, Polarizable Force Field, Protein
Copyright 2016 Armin Avdic