Unordered Tree Mining with Applications to Phylogeny

  • Authors:
  • Dennis Shasha;Jason T. L. Wang;Sen Zhang

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Frequent structure mining (FSM) aims to discover andextract patterns frequently occuring in structural data,such as trees and graphs.FSM finds many applications inbioinformatics, XML processing, Web log analysis, and soon.In this paper we present a new FSM technique for findingpatterns in rooted unordered labeled trees.The patternsof interest are cousin pairs in these trees.A cousin pair isa pair of nodes sharing the same parent, the same grand-parent,or the same great-grandparent, etc.Given a treeT, our algorithm finds all interesting cousin pairs of T inO(|T|2) time when |T| is the number of nodes in T.Experimentalresults on synthetic data and phylogenies showthe scalability and effectiveness of the proposed technique.To demonstrate the usefulness of our approach, we discussits applications to locating co-occurring patterns in multipleevolutionary trees, evaluating the consensus of equallyparsimonious trees, and finding kernel trees of groups ofphylogenies.We also describe extensions of our algorithmsfor undirected acyclic graphs (or free trees).