Mining probabilistic automata: a statistical view of sequential pattern mining

Authors:
Stéphanie Jacquemont;François Jacquenet;Marc Sebban
Affiliations:
Laboratoire Hubert Curien, UMR 5516 Université Jean Monnet, Saint-Étienne, France 42000;Laboratoire Hubert Curien, UMR 5516 Université Jean Monnet, Saint-Étienne, France 42000;Laboratoire Hubert Curien, UMR 5516 Université Jean Monnet, Saint-Étienne, France 42000
Venue:
Machine Learning
Year:
2009

Citing 26
Cited 2

Privacy-preserving data mining

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Sequence mining in categorical domains: incorporating constraints

Proceedings of the ninth international conference on Information and knowledge management
Web mining research: a survey

ACM SIGKDD Explorations Newsletter
SPADE: an efficient algorithm for mining frequent sequences

Machine Learning
Emerging scientific applications in data mining

Communications of the ACM - Evolving data mining into solutions for insights
Using finite state automata for sequence mining

ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Mining sequential patterns with constraints in large databases

Proceedings of the eleventh international conference on Information and knowledge management
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Data Mining for Measuring and Improving the Success of Web Sites

Data Mining and Knowledge Discovery
Mining Sequential Patterns with Regular Expression Constraints

IEEE Transactions on Knowledge and Data Engineering
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Learning Stochastic Regular Grammars by Means of a State Merging Method

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
k-anonymity: a model for protecting privacy

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
Sequential PAttern mining using a bitmap representation

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A theory of the learnable

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
State-of-the-art in privacy preserving data mining

ACM SIGMOD Record
Privacy preserving mining of association rules

Information Systems - Knowledge discovery and data mining (KDD 2002)
Preserving Privacy by De-Identifying Face Images

IEEE Transactions on Knowledge and Data Engineering
Data Privacy through Optimal k-Anonymization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Assessing data mining results via swap randomization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining evolving data streams for frequent patterns

Pattern Recognition
Discovering Significant Patterns

Machine Learning
Statistical supports for mining sequential patterns and improving the incremental update process on data streams

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
A bibliographical study of grammatical inference

Pattern Recognition
Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms

Pattern Recognition

Discovering Patterns in Flows: A Privacy Preserving Approach with the ACSM Prototype

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Significant motifs in time series

Statistical Analysis and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

During the past decade, sequential pattern mining has been the core of numerous research efforts. It is now possible to efficiently extract knowledge of users' behavior from a huge set of sequences collected over time. This has applications in various domains such as purchases in supermarkets, Web site visits, etc. However, sequence mining algorithms do little to control the risks of extracting false discoveries or overlooking true knowledge. In this paper, the theoretical conditions to achieve a relevant sequence mining process are examined. Then, the article offers a statistical view of sequence mining which has the following advantages: First, it uses a compact and generalized representation of the original sequences in the form of a probabilistic automaton. Second, it integrates statistical constraints to guarantee the extraction of significant patterns. Finally, it provides an interesting solution in a privacy preserving context in order to respect individuals' information. An application in car flow modeling is presented, showing the ability of our algorithm (acsm) to discover frequent routes without any private information. Comparisons with a classical sequence mining algorithm (spam) are made, showing the effectiveness of our approach.