A new algorithm for gap constrained sequence mining

Authors:
Salvatore Orlando;Raffaele Perego;Claudio Silvestri
Affiliations:
Università Ca' Foscari, Venezia, Italy;Istituto ISTI - Cons. Nazionale, delle Ricerche (CNR), Pisa, Italy;Università Ca' Foscari, Venezia, Italy
Venue:
Proceedings of the 2004 ACM symposium on Applied computing
Year:
2004

Citing 11
Cited 12

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
Mining sequential patterns with constraints in large databases

Proceedings of the eleventh international conference on Information and knowledge management
SPADE: An Efficient Algorithm for Mining Frequent Sequences

Machine Learning
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
An Efficient Algorithm for Mining Association Rules in Large Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive and Resource-Aware Mining of Frequent Sets

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining

Efficient strategies for tough aggregate constraint-based sequential pattern mining

Information Sciences: an International Journal
Approximate mining of frequent patterns on streams

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Fast discovery of sequential patterns in large databases using effective time-indexing

Information Sciences: an International Journal
Efficient mining of sequential patterns with time constraints: Reducing the combinations

Expert Systems with Applications: An International Journal
Mining frequent trajectory patterns in spatial-temporal databases

Information Sciences: an International Journal
Efficient discovery of structural motifs from protein sequences with combination of flexible intra- and inter-block gap constraints

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Fast discovery of time-constrained sequential patterns using time-indexes

ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
CAMLS: a constraint-based apriori algorithm for mining long sequences

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
A general effective framework for monotony and tough constraint based sequential pattern mining

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Sequential pattern mining -- approaches and algorithms

ACM Computing Surveys (CSUR)
A data mining approach to discovering reliable sequential patterns

Journal of Systems and Software
CSSF-trie structure to mine constraint sequential patterns from progressive database

International Journal of Knowledge Engineering and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The sequence mining problem consists in finding frequent sequential patterns in a database of time-stamped events. Several application domains require limiting the maximum temporal gap between events occurring in the input sequences. However pushing down such constraint is critical for most sequence mining algorithms.In this paper we describe CCSM (Cache-based Constrained Sequence Miner), a new level-wise algorithm that overcomes the troubles usually related to this kind of constraints. CCSM adopts an innovative approach based on k-way intersections of idlists to compute the support of candidate sequences. Our k-way intersection method is enhanced by the use of an effective cache that stores intermediate idlists for future reuse. The reuse of intermediate results entails a surprising reduction in the actual number of join operations performed on idlists.CCSM has been experimentally compared with cSPADE, a state of the art algorithm, on several synthetically generated datasets, obtaining better or similar results in most cases.