Mining Emerging Substrings

  • Authors:
  • Sarah Chan;Ben Kao;C. L. Yip;Michael Tang

  • Affiliations:
  • -;-;-;-

  • Venue:
  • DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

We introduce a new type of KDD patterns called emerging substrings. In a sequence database, an emerging substring (ES) of a data class is a substring which occurs morefrequently in that class rather than in other classes. ESs areimportant to sequence classification as they capture significant contrasts between data classes and provide insightsfor the construction of sequence classifiers. We propose asuffix tree-based framework for mining ESs, and study theeffectiveness of applying one or more pruning techniques indifferent stages of our ES mining algorithm. Experimentalresults show that if the target class is of a small population with respect to the whole database, which is the normal scenario in single-class ES mining, most of the pruningtechniques would achieve considerable performance gain.