Reliable Detection of Episodes in Event Sequences

Authors:
Robert Gwadera;Mikhail Atallah;Wojciech Szpankowski
Affiliations:
-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 12
Cited 13

Fast text searching: allowing errors

Communications of the ACM
Text algorithms

Text algorithms
An introduction to the analysis of algorithms

An introduction to the analysis of algorithms
Matching a set of strings with variable length don't cares

Theoretical Computer Science
Window-accumulated subsequence matching problem is linear

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Compact recognizers of episode sequences

Information and Computation
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Hidden Pattern Statistics

ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
Episode Matching

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Motif Statistics

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Fixed- vs. variable-length patterns for detecting suspicious process behavior

Journal of Computer Security

Video Data Mining: Semantic Indexing and Event Detection from the Association Perspective

IEEE Transactions on Knowledge and Data Engineering
Discovering Frequent Episodes and Learning Hidden Markov Models: A Formal Connection

IEEE Transactions on Knowledge and Data Engineering
Hidden word statistics

Journal of the ACM (JACM)
Frequency-based views to pattern collections

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
A fast algorithm for finding frequent episodes in event streams

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient mining of frequent episodes from complex sequences

Information Systems
Frequency-based views to pattern collections

Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Mining actionable partial orders in collections of sequences

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Processing count queries over event streams at multiple time granularities

Information Sciences: an International Journal
Ranking sequential patterns with respect to significance

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Mining closed episodes from event sequences efficiently

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Discovering lag intervals for temporal dependencies

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving monitoring and surveillance in sensor networks

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

Suppose one wants to detect "bad" or "suspicious" subsequencesin event sequences.Whether an observed patternof activity (in the form of a particular subsequence) is significantand should be a cause for alarm, depends on howlikely it is to occur fortuitously.A long enough sequenceof observed events will almost certainly contain any subsequence,and setting thresholds for alarm is an important issuein a monitoring system that seeks to avoid false alarms.Suppose a long sequence T of observed events contains asuspicious subsequence pattern S within it, where the suspicioussubsequence S consists of m events and spans a windowof size w within T.We address the fundamental problem:is a certain number of occurrences of a particular subsequenceunlikely to be fortuitous (i.e., indicative of suspiciousactivity)?If the probability of fortuitous occurrencesis high and an automated monitoring system flags it as suspiciousanyway, then such a system will suffer from generatingtoo many false alarms.This paper quantifies the probabilityof such an S occuring in T within a window of sizew, the number of distinct windows containing S as a subsequence,the expected number of such occurrences, its variance,and establishes its limiting distribution that allows toset up an alarm threshold so that the probability of falsealarms is very small.We report on experiments confirmingthe theory and showing that we can detect bad subsequenceswith low false alarm rate.