Interactive sequence discovery by incremental mining

  • Authors:
  • Ming-Yen Lin;Suh-Yin Lee

  • Affiliations:
  • Department of Computer Science and Information Engineering, National Chiao Tung University, Taiwan 30050, Taiwan, ROC;Department of Computer Science and Information Engineering, National Chiao Tung University, Taiwan 30050, Taiwan, ROC

  • Venue:
  • Information Sciences—Informatics and Computer Science: An International Journal - Special issue: Informatics and computer science intelligent systems applications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequential pattern mining has become a challenging task in data mining due to its complexity. Essentially, the mining algorithms discover all the frequent patterns meeting the user specified minimum support threshold. However, it is very unlikely that the user could obtain the satisfactory patterns in just one query. Usually the user must try various support thresholds to mine the database for the final desirable set of patterns. Consequently, the time-consuming mining process has to be repeated several times. However, current approaches are inadequate for such interactive mining due to the long processing time required for each query. In order to reduce the response time for each query during the interactive process, we propose a knowledge base assisted mining algorithm for interactive sequence discovery. The proposed approach utilizes the knowledge acquired from each mining process, accumulates the counting information to facilitate efficient counting of patterns, and speeds up the whole interactive mining process. Furthermore, the knowledge base makes possible the direct generation of new candidate sets and the concurrent support counting of variable sized candidates. Even for some queries, due to the pattern information already kept in the knowledge base, database access is not required at all. The conducted experiments show that our approach outperforms GSP, a state-of-the-art sequential pattern mining algorithm, by several order of magnitudes for interactive sequence discovery.