An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports

  • Authors:
  • Ya-Han Hu;Fan Wu;Yi-Jiun Liao

  • Affiliations:
  • Department of Information Management, National Chung Cheng University, Chiayi 62102, Taiwan, ROC;Department of Information Management, National Chung Cheng University, Chiayi 62102, Taiwan, ROC;Department of Information Management, National Chung Cheng University, Chiayi 62102, Taiwan, ROC

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequential pattern mining (SPM) is an important technique for determining time-related behavior in sequence databases. In real-life applications, the frequencies for various items in a sequence database are not exactly equal. If all items are set with the same minimum support, the rare item problem may result, meaning that we are unable to effectively retrieve interesting patterns regardless of whether minsup is set too high or too low. Liu (2006) first included the concept of multiple minimum supports (MMSs) to SPM. It allows users to specify the minimum item support (MIS) for each item according to its natural frequency. A generalized sequential pattern-based algorithm, named Multiple Supports - Generalized Sequential Pattern (MS-GSP), was also developed to mine complete set of sequential patterns. However, the MS-GSP adopts candidate generate-and-test approach, which has been recognized as a costly and time-consuming method in pattern discovery. For the efficient mining of sequential patterns with MMSs, this study first proposes a compact data structure, called a Preorder Linked Multiple Supports tree (PLMS-tree), to store and compress the entire sequence database. Based on a PLMS-tree, we develop an efficient algorithm, Multiple Supports - Conditional Pattern growth (MSCP-growth), to discover the complete set of patterns. The experimental result shows that the proposed approach achieves more preferable findings than the MS-GSP and the conventional SPM.