Discovering episodes with compact minimal windows

Authors:
Nikolaj Tatti
Affiliations:
ADReM, Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium and DTAI, Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium and Helsink ...
Venue:
Data Mining and Knowledge Discovery
Year:
2014

Citing 14
Cited 0

Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Episode Matching

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
A Practical Algorithm to Find the Best Episode Patterns

DS '01 Proceedings of the 4th International Conference on Discovery Science
Constraint-based mining of episode rules and optimal window sizes

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Reliable detection of episodes in event sequences

Knowledge and Information Systems
Discovering Frequent Closed Partial Orders from Strings

IEEE Transactions on Knowledge and Data Engineering
Discovering Significant Patterns

Machine Learning
Mining Frequent Itemsets in a Stream

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Self-sufficient itemsets: An approach to screening potentially interesting associations between items

ACM Transactions on Knowledge Discovery from Data (TKDD)
Significance of Episodes Based on Minimal Windows

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Mining closed episodes with simultaneous events

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining closed strict episodes

Data Mining and Knowledge Discovery
Discovering injective episodes with general partial orders

Data Mining and Knowledge Discovery
The long and the short of it: summarising event sequences with serial episodes

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovering the most interesting patterns is the key problem in the field of pattern mining. While ranking or selecting patterns is well-studied for itemsets it is surprisingly under-researched for other, more complex, pattern types. In this paper we propose a new quality measure for episodes. An episode is essentially a set of events with possible restrictions on the order of events. We say that an episode is significant if its occurrence is abnormally compact, that is, only few gap events occur between the actual episode events, when compared to the expected length according to the independence model. We can apply this measure as a post-pruning step by first discovering frequent episodes and then rank them according to this measure. In order to compute the score we will need to compute the mean and the variance according to the independence model. As a main technical contribution we introduce a technique that allows us to compute these values. Such a task is surprisingly complex and in order to solve it we develop intricate finite state machines that allow us to compute the needed statistics. We also show that asymptotically our score can be interpreted as a $$P$$P value. In our experiments we demonstrate that despite its intricacy our ranking is fast: we can rank tens of thousands episodes in seconds. Our experiments with text data demonstrate that our measure ranks interpretable episodes high.