Efficient algorithms for incremental maintenance of closed sequential patterns in large databases

  • Authors:
  • Lei Chang;Tengjiao Wang;Dongqing Yang;Hua Luan;Shiwei Tang

  • Affiliations:
  • School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Ch ...;School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Ch ...;School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Ch ...;School of Information, Renmin University of China, Beijing, China;School of Electronics Engineering and Computer Science, Peking University, Beijing, China and Key Laboratory of Machine Perception (Ministry of Education), Peking University

  • Venue:
  • Data & Knowledge Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent study shows that mining compact frequent patterns (such as closed patterns and compressed patterns) can alleviate the interpretability and efficiency problem encountered by traditional frequent pattern mining methods. Compact frequent patterns keep exact or approximate supports of a complete set of frequent patterns, and the number of them is often orders of magnitude smaller. Several efficient algorithms have been proposed to mine compact sequential patterns. However, sequence databases are not always static. Sequences (or items) are often added to and deleted from databases. A slight change made on a database may lead to the change of compact patterns. Mining from scratch is very time-consuming and thus infeasible. In this paper, we explore how to efficiently maintain closed sequential patterns in a dynamic sequence database environment. A compact structure CSTree is designed to keep closed sequential patterns, and its nice properties are carefully studied. Two efficient algorithms, IMCS"A and IMCS"D, are developed to maintain the CSTree upon incremental update. The algorithms make full use of the properties of CSTree to find nodes whose states are obsolete and avoid unnecessary node extension and closure checking operations to accelerate the incremental update process. A thorough experimental study on various real and synthetic datasets shows that the proposed algorithms outperform the state-of-the-art algorithms - PrefixSpan, CloSpan, BIDE and a recently proposed incremental mining algorithm IncSpan by about a factor of 4 to more than an order of magnitude.