Efficient strategies for tough aggregate constraint-based sequential pattern mining

Authors:
Enhong Chen;Huanhuan Cao;Qing Li;Tieyun Qian
Affiliations:
Department of Computer Science, University of Science and Technology of China, Hefei Anhui, PR China;Department of Computer Science, University of Science and Technology of China, Hefei Anhui, PR China;Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong;Department of Computer Science, Wuhan University, Wuhan, Hubei, PR China
Venue:
Information Sciences: an International Journal
Year:
2008

Citing 17
Cited 12

Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining sequential patterns with constraints in large databases

Proceedings of the eleventh international conference on Information and knowledge management
Mining Sequential Patterns with Regular Expression Constraints

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
A new algorithm for gap constrained sequence mining

Proceedings of the 2004 ACM symposium on Applied computing
An Efficient Algorithm for Mining Frequent Sequences by a New Strategy without Support Counting

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A Scalable Algorithm for Mining Maximal Frequent Sequences Using Sampling

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Efficient mining method for retrieving sequential patterns over online data streams

Journal of Information Science
A Multi-Supports-Based Sequential Pattern Mining Algorithm

CIT '05 Proceedings of the The Fifth International Conference on Computer and Information Technology
A Novel Method for Mining Sequential Patterns in Datasets

ISDA '06 Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications - Volume 01
Incremental Mining of Sequential Patterns over a Stream Sliding Window

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
A general effective framework for monotony and tough constraint based sequential pattern mining

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Fast discovery of sequential patterns in large databases using effective time-indexing

Information Sciences: an International Journal
Bottom-up discovery of frequent rooted unordered subtrees

Information Sciences: an International Journal
Efficient single-pass frequent pattern mining using a prefix-tree

Information Sciences: an International Journal
Sequential pattern mining algorithm for automotive warranty data

Computers and Industrial Engineering
An approach to discovering multi-temporal patterns and its application to financial databases

Information Sciences: an International Journal
Knowledge gathering of fuzzy multi-time-interval sequential patterns

Information Sciences: an International Journal
Mining weighted sequential patterns in a sequence database with a time-interval weight

Knowledge-Based Systems
Discovering multi-label temporal patterns in sequence databases

Information Sciences: an International Journal
An improved association rules mining method

Expert Systems with Applications: An International Journal
Generalized association rule mining with constraints

Information Sciences: an International Journal
Effective next-items recommendation via personalized sequential pattern mining

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
Recommendations of closed consensus temporal patterns by group decision making

Knowledge-Based Systems

Quantified Score

Hi-index	0.07

Visualization

Abstract

Frequent sequential pattern mining with constraints is the task of discovering patterns by incorporating the user defined constraints into the mining process, thus not only improving mining efficiency but also making the discovered patterns to better meet user requirements. Though many studies have been done, few have been carried out on the ''tough aggregate constraints'' due to the diffIculty of pushing the constraints into the mining process. In this paper we provide efficient strategies to deal with tough aggregate constraints. Through a theoretical analysis of the tough aggregate constraints based on the concept of total contribution of sequences, we first show that two typical kinds of constraints can be transformed into the same form and thus can be processed in a uniform way. We then propose a novel algorithm called PTAC (sequential frequent Patterns mining with Tough Aggregate Constraints) to reduce the cost of using tough aggregate constraints through incorporating two effective strategies. One avoids checking data items one by one by utilizing the features of promisingness exhibited by some other items and validity of the corresponding prefix. The other avoids constructing an unnecessary projected database through effectively pruning those unpromising new patterns that may, otherwise, serve as new prefixes. With these strategies, our algorithm obtains good performance in speed and space, as demonstrated by experimental studies conducted on the synthetic datasets generated by the IBM sequence generator, in addition to a real dataset.