Statistical supports for mining sequential patterns and improving the incremental update process on data streams

Authors:
Pierre-Alain Laur;Jean-Emile Symphor;Richard Nock;Pascal Poncelet
Affiliations:
Grimaag-Dépt Scientifique Interfacultaire, Université Antilles-Guyane, Campus de Schoelcher, B.P. 7209, 97275 Schoelcher Cedex, Martinique, France. E-mail: {palaur,je.symphor,rnock}@mart ...;Grimaag-Dépt Scientifique Interfacultaire, Université Antilles-Guyane, Campus de Schoelcher, B.P. 7209, 97275 Schoelcher Cedex, Martinique, France. E-mail: {palaur,je.symphor,rnock}@mart ...;Grimaag-Dépt Scientifique Interfacultaire, Université Antilles-Guyane, Campus de Schoelcher, B.P. 7209, 97275 Schoelcher Cedex, Martinique, France. E-mail: {palaur,je.symphor,rnock}@mart ...;{LG2IP}-Ecole des Mines d'Alès, Site EERIE, parc scientifique Georges Besse, 30035 Nîmes Cedex, France. E-mail: pascal.poncelet@ema.fr
Venue:
Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Year:
2007

Citing 26
Cited 4

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Incremental and interactive sequence mining

Proceedings of the eighth international conference on Information and knowledge management
Some PAC-Bayesian Theorems

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
FreeSpan: frequent pattern-projected sequential pattern mining

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Mining long sequential patterns in a noisy environment

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Data Mining and Knowledge Discovery
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
A Fast, Bottom-Up Decision Tree Pruning Algorithm with Near-Optimal Generalization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The PSP Approach for Mining Sequential Patterns

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams

ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
What's hot and what's not: tracking most frequent items dynamically

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management

ACM SIGMOD Record
Incremental mining of sequential patterns in large databases

Data & Knowledge Engineering
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Dynamically maintaining frequent items over a data stream

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
IncSpan: incremental mining of sequential patterns in large database

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical Region Merging

IEEE Transactions on Pattern Analysis and Machine Intelligence
Framework and algorithms for trend analysis in massive temporal data sets

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Approximate frequency counts over data streams

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A regression-based temporal pattern mining scheme for data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Mining probabilistic automata: a statistical view of sequential pattern mining

Machine Learning
Sequential pattern mining algorithm for automotive warranty data

Computers and Industrial Engineering
A lower bound on the sample size needed to perform a significant frequent pattern mining task

Pattern Recognition Letters
Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the knowledge extraction community takes a closer look at new models where data arrive in timely manner like a fast and continuous flow, i.e. data streams. As only a part of the stream can be stored, mining data streams for sequential patterns and updating previously found frequent patterns need to cope with uncertainty. In this paper, we introduce a new statistical approach which biases the initial support for sequential patterns. This approach holds the advantage to maximize either the precision or the recall, as chosen by the user, and limit the degradation of the other criterion. Moreover, these statistical supports help building statistical borders which are the relevant sets of frequent patterns to use into an incremental mining process. From the statistical standpoint, theoretical results show that the technique is not far from the optimum. Experiments performed on sequential patterns demonstrate the interest of this approach and the potential of such techniques.