Structural bioinformatics

Joint modelling of sequence and structure evolution

For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally diverge more slowly than sequences. In order to combine structural information with statistical models of sequence evolution, we developed a stochastic model of multiple structures on a phylogenetic tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence–structure model.

The inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases, the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences.

(software implemented in Java)

Challis CJ, Schmidler SC (2012), "A stochastic evolutionary model for protein structure alignment and phylogeny", Molecular Biology and Evolution, Vol 29, No 11, pp 3575-87.
JL Herman, CJ Challis, Á Novák, J Hein and SC Schmidler (2014). Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. Molecular Biology and Evolution 31(9): 2251-2266 .
 
 

Modelling structural evolution with between-site correlations

Although the above methodology accounts for phylogenetic correlations between structures, it assumes independence between atoms for the sake of computational tractibility. We have also examined non-phylogenetic structural models that allow for correlations between sites, using a simplified model of protein dynamics. Preliminary results suggest that such a model may be useful in distinguishing evolutionary drift from dynamic fluctuations in protein structures.


JL Herman, R Lyngsø and J Hein (2011), “Statistical alignment of multiple protein structures under a dynamics-based model of structural evolution” in Next Generation Statistics in Biosciences (LASR 2011), A. Gusnanto, K.V. Mardia and C.J. Fallaize eds., Leeds University Press.

JL Herman, R Lyngsø and J Hein (2011), “Probabilistic models for protein structure alignment”, Poster presented at Hierarchical Models and Markov Chain Monte Carlo, conference in honour of Adrian F.M. Smith, Crete.

 
 

Predicting three-dimensional contacts in protein structures using evolutionary couplings

We have been making use of a mixture-of-forest approach for modelling correlated evolution in genetic sequences, using this methodology to identify putative couplings between residues. With the large amount of sequence data now available, it is in many cases to derive highly accurate contact predictions from this sequence information alone.

JL Herman, A Cumberworth, AM Grigore, NMD Niezink and J Hein (in preparation), “Identification of evolutionary couplings from multiple sequence alignments using mixtures of forests”