On the approximation of curves by line segments using dynamic programming
Communications of the ACM
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Finding simple intensity descriptions from event sequence data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Stochastic Complexity in Statistical Inquiry Theory
Stochastic Complexity in Statistical Inquiry Theory
Mining long sequential patterns in a noisy environment
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
An Online Algorithm for Segmenting Time Series
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
MDL learning of unions of simple pattern languages from positive examples
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Pattern discovery in sequences under a Markov assumption
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
BRAID: stream mining through group lag correlations
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Optimal multi-scale patterns in time series streams
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Constraint-based sequential pattern mining: the pattern-growth methods
Journal of Intelligent Information Systems
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
EventSummarizer: a tool for summarizing large event sequences
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Time sequence summarization to scale up chronology-dependent applications
Proceedings of the 18th ACM conference on Information and knowledge management
An algorithmic approach to event summarization
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Proceedings of the 20th ACM international conference on Information and knowledge management
LogSig: generating system events from raw textual logs
Proceedings of the 20th ACM international conference on Information and knowledge management
Data summarization model for user action log files
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
An integrated framework for optimizing automatic monitoring systems in large IT infrastructures
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Event sequences capture system and user activity over time. Prior research on sequence mining has mostly focused on discovering local patterns. Though interesting, these patterns reveal local associations and fail to give a comprehensive summary of the entire event sequence. Moreover, the number of patterns discovered can be large. In this paper, we take an alternative approach and build short summaries that describe the entire sequence, while revealing local associations among events. We formally define the summarization problem as an optimization problem that balances between shortness of the summary and accuracy of the data description. We show that this problem can be solved optimally in polynomial time by using a combination of two dynamic-programming algorithms. We also explore more efficient greedy alternatives and demonstrate that they work well on large datasets. Experiments on both synthetic and real datasets illustrate that our algorithms are efficient and produce high-quality results, and reveal interesting local structures in the data.