Benchmarking the effectiveness of sequential pattern mining methods

Authors:
Hye-Chung Kum;Joong Hyuk Chang;Wei Wang
Affiliations:
Department of Computer Science, University of North Carolina at Chapel Hill, NC 27599, USA;Department of Computer Science, Yonsei University, Seoul 120-749, Korea;Department of Computer Science, University of North Carolina at Chapel Hill, NC 27599, USA
Venue:
Data & Knowledge Engineering
Year:
2007

Citing 9
Cited 10

Efficient enumeration of frequent sequences

Proceedings of the seventh international conference on Information and knowledge management
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
A Scalable Algorithm for Clustering Sequential Data

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Managing Interesting Rules in Sequence Mining

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering

Sequential Pattern Mining in Multi-Databases via Multiple Alignment

Data Mining and Knowledge Discovery
A two-stage methodology for sequence classification based on sequential pattern mining and optimization

Data & Knowledge Engineering
Fast discovery of sequential patterns in large databases using effective time-indexing

Information Sciences: an International Journal
Using Knowledge Discovery Techniques to Support Tutoring in an Ill-Defined Domain

ITS '08 Proceedings of the 9th international conference on Intelligent Tutoring Systems
A change detection method for sequential patterns

Decision Support Systems
Sequential pattern mining algorithm for automotive warranty data

Computers and Industrial Engineering
Mining sequential patterns across multiple sequence databases

Data & Knowledge Engineering
Intelligent sequential mining via alignment: optimization techniques for very large DB

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Learning task models in ill-defined domain using an hybrid knowledge discovery framework

Knowledge-Based Systems
Sequential pattern mining -- approaches and algorithms

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, there is an increasing interest in new intelligent mining methods to find more meaningful and compact results. In intelligent data mining research, accessing the quality and usefulness of the results from different mining methods is essential. However, there is no general benchmarking criteria to evaluate whether these new methods are indeed more effective compared to the traditional methods. Here we propose a novel benchmarking criteria that can systematically evaluate the effectiveness of any sequential pattern mining method under a variety of situations. The benchmark evaluates how well a mining method finds known common patterns in synthetic data. Such an evaluation provides a comprehensive understanding of the resulting patterns generated from any mining method empirically. In this paper, the criteria are applied to conduct a detailed comparison study of the support-based sequential pattern model with an approximate pattern model based on sequence alignment. The study suggests that the alignment model will give a good summary of the sequential data in the form of a set of common patterns in the data. In contrast, the support model generates massive amounts of frequent patterns with much redundancy. This suggests that the results of the support model require more post processing before it can be of actual use in real applications.