An incremental mining algorithm for maintaining sequential patterns using pre-large sequences

  • Authors:
  • Tzung-Pei Hong;Ching-Yao Wang;Shian-Shyong Tseng

  • Affiliations:
  • Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan and Department of Computer Science and Engineering, National Sun Yat-sen Univers ...;Information and Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan;Institute of Computer and Information Science, National Chiao-Tung University, Hsinchu 300, Taiwan

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

Mining useful information and helpful knowledge from large databases has evolved into an important research area in recent years. Among the classes of knowledge derived, finding sequential patterns in temporal transaction databases is very important since it can help model customer behavior. In the past, researchers usually assumed databases were static to simplify data-mining problems. In real-world applications, new transactions may be added into databases frequently. Designing an efficient and effective mining algorithm that can maintain sequential patterns as a database grows is thus important. In this paper, we propose a novel incremental mining algorithm for maintaining sequential patterns based on the concept of pre-large sequences to reduce the need for rescanning original databases. Pre-large sequences are defined by a lower support threshold and an upper support threshold that act as gaps to avoid the movements of sequences directly from large to small and vice versa. The proposed algorithm does not require rescanning original databases until the accumulative amount of newly added customer sequences exceeds a safety bound, which depends on database size. Thus, as databases grow larger, the numbers of new transactions allowed before database rescanning is required also grow. The proposed approach thus becomes increasingly efficient as databases grow.