Discovery by Minimal Length Encoding: A Case Study in Molecular Evolution

  • Authors:
  • Aleksandar Milosavljević;Jerzy Jurka

  • Affiliations:
  • Linus Pauling Institute of Science and Medicine, 440 Page Mill Rd., Palo Alto, CA 94306. Current address: Genome Structure Group, Biological and Medical Research Division, Argonne National ...;Linus Pauling Institute of Science and Medicine, 440 Page Mill Rd., Palo Alto, CA 94306. JURKA@JMULLINS@STANFORD.EDU

  • Venue:
  • Machine Learning
  • Year:
  • 1993

Quantified Score

Hi-index 0.06

Visualization

Abstract

We apply the Minimal Length Encoding Principle to formalize inference about the evolution of macromolecular sequences. The Principle is shown to imply a combination of Weighted Parsimony and Compatibility methods that have long been used by biologists because of their good practical performance. The background assumptions are expressed as an encoding scheme for the observed data and as heuristic rules for selection of diagnostic positions in the sequences. The Principle was applied to discover new subfamilies of Alu sequences, the most numerous family of repetitive DNA sequences in the human genome.