Stochastic modelling and analysis: a computational approach
Stochastic modelling and analysis: a computational approach
Elements of information theory
Elements of information theory
An HMM-Based Threshold Model Approach for Gesture Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining Sequential Patterns: Generalizations and Performance Improvements
EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Clustering Techniques in Biological Sequence Analysis
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Mining Generalized Association Rules
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Probabilistic User Behavior Models
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Time-focused clustering of trajectories of moving objects
Journal of Intelligent Information Systems
Trajectory clustering: a partition-and-group framework
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Taxonomy-superimposed graph mining
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Similarity-based clustering of sequences using hidden Markov models
MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
Traffic density-based discovery of hot routes in road networks
SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
Guest editors' introduction: special issue of selected papers from ECML PKDD 2009
Data Mining and Knowledge Discovery
Guest editors' introduction: Special Issue from ECML PKDD 2009
Machine Learning
Taxonomy-Driven Lumping for Sequence Mining
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Hi-index | 0.00 |
Given a taxonomy of events and a dataset of sequences of these events, we study the problem of finding efficient and effective ways to produce a compact representation of the sequences. We model sequences with Markov models whose states correspond to nodes in the provided taxonomy, and each state represents the events in the subtree under the corresponding node. By lumping observed events to states that correspond to internal nodes in the taxonomy, we allow more compact models that are easier to understand and visualize, at the expense of a decrease in the data likelihood. We formally define and characterize our problem, and we propose a scalable search method for finding a good trade-off between two conflicting goals: maximizing the data likelihood, and minimizing the model complexity. We implement these ideas in Taxomo, a taxonomy-driven modeler, which we apply in two different domains, query-log mining and mining of moving-object trajectories. The empirical evaluation confirms the feasibility and usefulness of our approach.