Top-down induction of phylogenetic trees

  • Authors:
  • Celine Vens;Eduardo Costa;Hendrik Blockeel

  • Affiliations:
  • Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium;Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium

  • Venue:
  • EvoBIO'10 Proceedings of the 8th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a novel distance based method for phylogenetic tree reconstruction. Our method is based on a conceptual clustering method that extends the well-known decision tree learning approach. It starts from a single cluster and repeatedly splits it into subclusters until all sequences form a different cluster. We assume that a split can be described by referring to particular polymorphic locations, which makes such a divisive method computationally feasible. To define the best split, we use a criterion that is close to Neighbor Joining’s optimization criterion, namely, minimizing total branch length. A thorough experimental evaluation shows that our method yields phylogenetic trees with an accuracy comparable to that of existing methods. Moreover, it has a number of important advantages. First, by listing the polymorphic locations at the internal nodes, it provides an explanation for the resulting tree topology. Second, the top-down tree growing process can be stopped before a complete tree is generated, yielding an efficient gene or protein subfamily identification approach. Third, the resulting trees can be used as classification trees to classify new sequences into subfamilies.