Sampling for Sequential Pattern Mining: From Static Databases to Data Streams

Authors:
Chedy Raissi;Pascal Poncelet
Affiliations:
-;-
Venue:
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Year:
2007

Citing 0
Cited 3

A lower bound on the sample size needed to perform a significant frequent pattern mining task

Pattern Recognition Letters
Sampling for information and structure preservation when mining large data bases

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
A simple, yet effective and efficient, sliding window sampling algorithm

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequential pattern mining is an active field in the domain of knowledge discovery. Recently, with the constant progress in hardware technologies, real-world databases tend to grow larger and the hypothesis that a database can be loaded into main-memory for sequential pattern mining purpose is no longer valid. Furthermore, the new model of data as a continuous and potentially infinite flow, known as data stream model, call for a pre-processing step to ease the mining operations. Since the database size is the most influential factor for mining algorithms we examine the use of sampling over static databases to get approximate mining results with an upper bound on the error rate. Moreover, we extend these sampling analysis and present an algorithm based on reservoir sampling to cope with sequential pattern mining over data streams. We demonstrate with empirical results that our sampling methods are efficient and that sequence mining remains accurate over static databases and data streams.