Mutations in monogenic disease and cancer:
The majority of mutations that cause rare monogenic disease and cancer are missense single DNA base changes (resulting in an amino acid substitution). Missense mutations have a wide range of effects, from completely benign to highly pathogenic and maximum utilization of the wealth of new sequence information on these classes of disease requires reliable computational methods of assigning pathogenicity. The Moult group has developed machine learning methods for estimating the impact of these mutations both in terms of fitness and, when three dimensional structure is available, at the molecule level. We are particularly interested in the relationship between the impact of mutations at the molecular level and disease phenotypes. New work, supported by a NIH R01 (September 2016), is focused on the optimum use of three dimensional protein structure to identify the molecular mechanisms by which mutations affect function, and on improved estimates of reliability so as to increase utility in the clinic. A major component is expert sourcing of function annotation. A set of human protein models will be built to maximize use of structure, in collaboration with Krzysztof Fidelis at UC Davis. Results will be made available through Proteopedia in collaboration with Joel Sussman at the Weizmann Institute.
Yue, P. and Moult, J. (2006). Identification and analysis of deleterious human SNPs. J Mol Biol 356 1263-74.
Shi, Z., Sellers, J., and Moult, J. (2011). Protein stability and in vivo concentration of missense mutations in phenylalanine hydroxylase. Proteins, 80(1), 61-70 PMCID: 4170182.
Shi, Z., and Moult, J. (2011) Structural and functional impact of cancer-related missense somatic mutations, J Mol Biol 413, 495-512 PMCID: 4177034.
Complex trait disease:
Most common human disease, such as Alzheimer’s, Asthma, Rheumatoid arthritis and many others, is complex in the sense that many variants across the genome contribute to disease risk and severity, and there is also a substantial environmental component. Primarily as a result of genome wise association studies (GWAS), for some such diseases, for example Crohn’s, over 100 variants associated with disease risk are now known. The Moult group is conducting extensive analysis of the molecular level mechanisms underlying a number of complex trait diseases, including the introduction of new methods for identifying mechanism variants.
Gorlatova, N., Chao, K., Pal, L. R., Araj, R. H., Galkin, A., Turko, I., Moult, J., et al. (2011). Protein characterization of a Candidate Mechanism SNP for Crohn’s Disease: The Macrophage Stimulating Protein R689C Substitution. PloS one, 6(11), e27269. PMCID: PMC3210151
Pal, L. R., Yu, C. H., Mount, S. M., and Moult, J. (2015) Insights from GWAS: emerging landscape of mechanisms underlying complex trait disease, BMC Genomics 16 Suppl 8, S4 . PMCID: 4480957.
Pal, L.R,, Moult ,J. (2015) Genetic Basis Common Human Disease: Insight into the Role of Missense SNPs from Genome-Wide Association Studies J.Mol Biol 3;427(13):2271-89. PMCID: PMC4893807
Yu, C.H., Pal, L.R., Moult, J. (2016) Consensus Genome-Wide Expression Quantitative Trait Loci and Their Relationship with Human Complex Trait Disease. OMICS 2016 Jul;20(7):400-14. PMID: 27428252
As noted above, genome wide association studies (GWAS) have provided a wealth of new insight into which genetic loci are associated with disease phenotypes for many complex trait diseases. So far, though, identification of these loci has led to only limited advances in understanding the underlying disease mechanisms, and many questions, including which subsystems are involved, the extent and nature of epistatic effects, and possible new therapeutic strategies, remain unanswered. Addressing these issues requires an appropriate representation and a means of integrating knowledge from many sources. We have utilized formal concepts of biological mechanism to develop a graphical framework that allows answers to these questions to be elucidated. The relationship between each disease associated locus and a corresponding disease phenotype is represented by a mechanism chain linking the relevant genetic variant to a disease phenotype through a series of substate perturbations at the RNA, DNA, protein, cellular, organ, and other stages. Each pair of consecutive perturbed substates is connected by a perturbation mechanism. The set of intersecting mechanism chains form a mechanism graph, whose features may be used for subsystem and epistatic analysis. A simple graphical language is used to represent the chain. User-friendly tools, including pull down menus for ontology terms, allow approved contributors to build and edit chains. This work is in conjunction with Professor Lindley Darden at UM College Park.
Presentation of this work in The Genomics of Common Diseases 2016 conference: pdf
Darden, L., Pal, L. R., Kundu, K., and Moult, J. (2017) The product guides the process: Discovering disease mechanisms. In press: Building Theories, Edited by Ippoliti, E., and Danks, D., Springer. Preprint available: Preprint.
Computational biology differs from traditional science in that it takes place in a virtual world. Achieving rigor in a computational world which the scientist controls is much harder than when dealing with the inflexible realities of the physical world. We introduced Community assessment experiments in computational biology to help achieve the same rigor as in real world science. CASP (Critical Assessment of Structure Prediction), the first framework for these experiments, is an organization that conducts double blind community wide experiments to determine the state of the art of computational methods for modeling protein structure from amino acid sequence and other information. CASP has now been running for over 20 years, with continuing high participation rates (over 100 groups around the world), and has been accompanied by an enormous improvement in the accuracy of the protein modeling methods. The CASP methodology has now been adopted in a wide range of computational biology areas, including protein-protein interactions, genome sequence annotation, biological networks, and protein function annotation. John Moult is founder and chair of CASP.
Moult, J. (2006) Rigorous performance evaluation in protein structure modelling and implications for computational biology, Philos Trans R Soc Lond B Biol Sci 361, 453-458 (2006). PMCID: 1609338.
Kryshtafovych A, Moult J, Baslé A, Burgin A, Craig TK, Edwards RA, Fass D, Hartmann MD, Korycinski M, Lewis RJ, Lorimer D, Lupas AN, Newman J, Peat TS, Piepenbrink KH, Prahlad J, van Raaij MJ, Rohwer F, Segall AM, Seguritan V, Sundberg EJ, Singh AK, Wilson MA, Schwede T. (2015) Some of the most interesting CASP11 targets through the eyes of their authors. Proteins. doi: 10.1002/prot.24942. [Epub ahead of print] PMCID: 4072496.
Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., and Tramontano, A. (2016) Critical assessment of methods of protein structure prediction (CASP) - progress and new directions in Round XI, Proteins doi: 10.1002/prot.25064 [Epub ahead of print]
Full utilization of the flood of new genomic data related to human disease requires reliable and effective computational methods for deducing phenotype from genotype. Using a similar organizational framework to CASP, CAGI (Critical Assessment of Genome Interpretation) conducts double blind community experiments to determine the state of the art for these methods, particularly in the area of human disease. Challenges range over monogenic disease, complex trait disease, and cancer. John Moult is co-founder and co-chair of CAGI. Moult group members also participate in CAGI, providing rigorous testing of our methods.
Link to CAGI website.
Computational studies of protein structure and function:
Current work in the group builds on a long history of algorithm development and computational studies of protein structure and function. Algorithm development includes one of the first protein structure refinement methods; the first protocol for Monte Carlo simulation of protein solvation, allowing a detailed analysis of solvation energetics; a novel systematic search procedure for ab initio modeling of short regions of polypeptide chain, a first application of genetic algorithms for fragment based assembly of complete protein structures; and a graph-based method for identifying internally self-consistent protein conformations for homology modeling of protein structure. Major insights include establishing that protein molecules are surrounded by a shell of ordered and semi-ordered water molecules, a molecular dynamics study showing that protein folding is formally chaotic, that protein folding is an NP-hard process, and establishing that the active sites of enzymes are often conformationally strained.