Reliable detection of episodes in event sequences

Authors:
Robert Gwadera;Mikhail J. Atallah;Wojciech Szpankowski
Affiliations:
Purdue University, Department of Computer Science, 47907, W. Lafayette, IN, USA;Purdue University, Department of Computer Science, 47907, W. Lafayette, IN, USA;Purdue University, Department of Computer Science, 47907, W. Lafayette, IN, USA
Venue:
Knowledge and Information Systems
Year:
2005

Citing 11
Cited 12

Fast text searching: allowing errors

Communications of the ACM
Text algorithms

Text algorithms
An introduction to the analysis of algorithms

An introduction to the analysis of algorithms
Matching a set of strings with variable length don't cares

Theoretical Computer Science
Window-accumulated subsequence matching problem is linear

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Compact recognizers of episode sequences

Information and Computation
Discovery of Frequent Episodes in Event Sequences

Data Mining and Knowledge Discovery
Episode Matching

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Motif Statistics

ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Fixed- vs. variable-length patterns for detecting suspicious process behavior

Journal of Computer Security

Aggregating time partitions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Anomaly detection: A survey

ACM Computing Surveys (CSUR)
On integrating event definition and event detection

Knowledge and Information Systems
Mining closed episodes with simultaneous events

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Behavioural Proximity Discovery: an adaptive approach for root cause analysis

International Journal of Business Intelligence and Data Mining
Mining closed strict episodes

Data Mining and Knowledge Discovery
Mining statistically significant substrings using the chi-square statistic

Proceedings of the VLDB Endowment
The long and the short of it: summarising event sequences with serial episodes

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Faster variance computation for patterns with gaps

MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Mining high utility episodes in complex event sequences

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Review: A review of novelty detection

Signal Processing
Discovering episodes with compact minimal windows

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.01

Visualization

Abstract

Suppose one wants to detect bad or suspicious subsequences in event sequences. Whether an observed pattern of activity (in the form of a particular subsequence) is significant and should be a cause for alarm depends on how likely it is to occur fortuitously. A long-enough sequence of observed events will almost certainly contain any subsequence, and setting thresholds for alarm is an important issue in a monitoring system that seeks to avoid false alarms. Suppose a long sequence, T, of observed events contains a suspicious subsequence pattern, S, within it, where the suspicious subsequence S consists of m events and spans a window of size w within T. We address the fundamental problem: Is a certain number of occurrences of a particular subsequence unlikely to be generated by randomness itself (i.e. indicative of suspicious activity)? If the probability of an occurrence generated by randomness is high and an automated monitoring system flags it as suspicious anyway, then such a system will suffer from generating too many false alarms. This paper quantifies the probability of such an S occurring in T within a window of size w, the number of distinct windows containing S as a subsequence, the expected number of such occurrences, its variance, and establishes its limiting distribution that allows setting up an alarm threshold so that the probability of false alarms is very small. We report on experiments confirming the theory and showing that we can detect bad subsequences with low false alarm rate.