Fast discovery of sequential patterns in large databases using effective time-indexing

  • Authors:
  • Ming-Yen Lin;Sue-Chen Hsueh;Chia-Wen Chang

  • Affiliations:
  • Department of Information Engineering and Computer Science, Feng Chia University, No. 100 Wenhwa Road, Seatwen, Taichung 40724, Taiwan;Department of Information Management, Chaoyang University of Technology, Taiwan;Department of Information Engineering and Computer Science, Feng Chia University, No. 100 Wenhwa Road, Seatwen, Taichung 40724, Taiwan

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2008

Quantified Score

Hi-index 0.07

Visualization

Abstract

Sequential pattern mining algorithms can often produce more accurate results if they work with specific constraints in addition to the support threshold. Many systems implement time-independent constraints by selecting qualified patterns. This selection cannot implement time-dependent constraints, because the support computation process must validate the time attributes of every data sequence during mining. Therefore, we propose a memory time-indexing approach, called METISP, to discover sequential patterns with time constraints including minimum-gap, maximum-gap, exact-gap, sliding window, and duration constraints. METISP scans the database into memory and constructs time-index sets for effective processing. METISP uses index sets and a pattern-growth strategy to mine patterns without generating any candidates or sub-databases. The index sets narrow down the search space to the sets of designated in-memory data sequences, and speed up the counting of potential items within the indicated ranges. Our comprehensive experiments show that METISP has better efficiency, even with low support and large databases, than the well-known GSP and DELISP algorithms. METISP scales up linearly with respect to database size.