Research article: Estimating sufficient statistics in co-evolutionary analysis by mutual information

  • Authors:
  • Philipp Weil;Franziska Hoffgaard;Kay Hamacher

  • Affiliations:
  • Theoretical Biology and Bioinformatics, Institute of Microbiology and Genetics, Department of Biology, TU Darmstadt, Schnittspahnstr. 10, 64287 Darmstadt, Germany;Theoretical Biology and Bioinformatics, Institute of Microbiology and Genetics, Department of Biology, TU Darmstadt, Schnittspahnstr. 10, 64287 Darmstadt, Germany;Theoretical Biology and Bioinformatics, Institute of Microbiology and Genetics, Department of Biology, TU Darmstadt, Schnittspahnstr. 10, 64287 Darmstadt, Germany

  • Venue:
  • Computational Biology and Chemistry
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Mutual information (MI) is a standard measure in information theory to observe and quantify correlated signals and events in both, empirical data sets and theoretical models. In the field of computational biology the MI turned out to be particularly useful in studies on co-evolutionary signals of sites within biomolecules. A key issue in the applicability of the MI is, however, a correct reference system or null model to understand finite-size effects in the underlying, finite data set. Although some bioinformatics studies exist with rigorous results for theoretical, well-designed random distributions, data from real-world proteins was never used to quantify the effect of finite-size samples. The impact of real-world statistics is, however, most relevant for researchers in all fields concerned with detecting evolutionary signals within biological sequences. We present results on such effects in finite-sized biological data sets and point to future research directions. We are most of all concerned with bacterial, ribosomal proteins as a prototypical example in molecular evolution. We compare to previous published suggestions, give an empirical formula, and propose a protocol to guide future research projects.