Towards bounding sequential patterns

Authors:
Chedy Raïssi;Jian Pei
Affiliations:
INRIA, Nancy, France;Simon Fraser University, Vancouver, BC, Canada
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 23
Cited 1

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
A Tight Upper Bound on the Number of Candidate Patterns

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Feasible itemset distributions in data mining: theory and application

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
GIMS - A Data Warehouse for Storage and Analysis of Genome Sequence and Functional Data

BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
Nonexistence of a Kruskal-Katona Type Theorem for Subword Orders

Combinatorica
132-avoiding two-stack sortable permutations, Fibonacci numbers, and Pell numbers

Discrete Applied Mathematics
TSP: Mining top-k closed sequential patterns

Knowledge and Information Systems
Distribution-Based Synthetic Database Generation Techniques for Itemset Mining

IDEAS '05 Proceedings of the 9th International Database Engineering & Application Symposium
Warehousing and Analyzing Massive RFID Data Sets

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Generatingfunctionology

Generatingfunctionology
Constraint-based sequential pattern mining: the pattern-growth methods

Journal of Intelligent Information Systems
Opinion spam and analysis

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
OLAP on sequence data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Analytic Combinatorics

Analytic Combinatorics
OLAP on search logs: an infrastructure supporting data-driven applications in search engines

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Sequence Data Mining

Sequence Data Mining
Towards generic pattern mining

ICFCA'05 Proceedings of the Third international conference on Formal Concept Analysis

Frequent patterns mining in multiple biological sequences

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a sequence database, can we have a non-trivial upper bound on the number of sequential patterns? The problem of bounding sequential patterns is very challenging in theory due to the combinatorial complexity of sequences, even given some inspiring results on bounding itemsets in frequent itemset mining. Moreover, the problem is highly meaningful in practice, since the upper bound can be used in many applications such as space allocation in building sequence data warehouses. In this paper, we tackle the problem of bounding sequential patterns by presenting, for the first time in the field of sequential pattern mining, strong combinatorial results on computing the number of possible sequential patterns that can be generated at a given length k. We introduce, as a case study, two novel techniques to estimate the number of candidate sequences. An extensive empirical study on both real data and synthetic data verifies the effectiveness of our methods.