GO-SPADE: mining sequential patterns over datasets with consecutive repetitions

Authors:
Marion Leleu;Christophe Rigotti;Jean-François Boulicaut;Guillaume Euvrard
Affiliations:
Laboratoire d'Ingénierie des Systèmes d'Information, Bâtiment Blaise Pascal, INSA Lyon, Villeurbanne Cedex, France and Informatique CDC, Bagneux, France;Laboratoire d'Ingénierie des Systèmes d'Information, Bâtiment Blaise Pascal, INSA Lyon, Villeurbanne Cedex, France;Laboratoire d'Ingénierie des Systèmes d'Information, Bâtiment Blaise Pascal, INSA Lyon, Villeurbanne Cedex, France;Informatique CDC, Bagneux, France
Venue:
MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Year:
2003

Citing 10
Cited 6

Efficient enumeration of frequent sequences

Proceedings of the seventh international conference on Information and knowledge management
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth

Proceedings of the 17th International Conference on Data Engineering
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

Discovering Frequent Arrangements of Temporal Intervals

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Relation rule mining

International Journal of Parallel, Emergent and Distributed Systems
Mining frequent trajectory patterns in spatial-temporal databases

Information Sciences: an International Journal
Mining frequent arrangements of temporal intervals

Knowledge and Information Systems
A resistive TCAM accelerator for data-intensive computing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Closeness Preference - A new interestingness measure for sequential rules mining

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Databases of sequences can contain consecutive repetitions of items. This is the case in particular when some items represent discretized quantitative values. We show that on such databases, a typical algorithm like the SPADE algorithm tends to loose its efficiency. SPADE is based on the used of lists containing the localization of the occurrences of a pattern in the sequences and these lists are not appropriated in the case of data with repetitions. We introduce the concept of generalized occurrences and the corresponding primitive operators to manipulate them. We present an algorithm called GO-SPADE that extends SPADE to incorporate generalized occurrences. Finally we present experiments showing that GO-SPADE can handle sequences containing consecutive repetitions at nearly no extra cost.