Efficiently mining frequent trees in a forest

  • Authors:
  • Mohammed J. Zaki

  • Affiliations:
  • Rensselaer Polytechnic Institute, Troy NY

  • Venue:
  • Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TREEMINER, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scope-list. We contrast TREEMINER with a pattern matching tree mining algorithm (PATTERNMATCHER). We conduct detailed experiments to test the performance and scalability of these methods. We find that TREEMINER outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.