TreeRank: a similarity measure for nearest neighbor searching in phylogenetic database

  • Authors:
  • Jason T. L. Wang;Huiyuan Shan;Dennis Shasha;William H. Piel

  • Affiliations:
  • New Jersey Institute of Technology;New Jersey Institute of Technology;New York University;University at Buffalo

  • Venue:
  • SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Phylogenetic trees are unordered labeled trees in which each leaf node has a label and the order among siblings is unimportant. In this paper we propose a new similarity measure, called TreeRank, for phylogenetic trees and present an algorithm for computing TreeRank scores. Given a query or pattern tree P and a data tree D, the TreeRank score from P to D is a measure of the topological relationships in P that are found to be the same or similar in D. The proposed algorithm calculates the TreeRank score in O(M2 + N) time where M is the number of nodes appearing in both P and D, and N is the number of nodes in D. We then develop a search engine that, given a query or pattern tree P and a database of trees D, finds and ranks the nearest neighbors of P in D where the "nearness" is measured by the proposed similarity function. This structure-based search engine is fully operational and is available on the World Wide Web.