Fast text searching: allowing errors
Communications of the ACM
Text algorithms
An introduction to the analysis of algorithms
An introduction to the analysis of algorithms
Matching a set of strings with variable length don't cares
Theoretical Computer Science
Window-accumulated subsequence matching problem is linear
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Compact recognizers of episode sequences
Information and Computation
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Fixed- vs. variable-length patterns for detecting suspicious process behavior
Journal of Computer Security
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
On integrating event definition and event detection
Knowledge and Information Systems
Mining closed episodes with simultaneous events
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Behavioural Proximity Discovery: an adaptive approach for root cause analysis
International Journal of Business Intelligence and Data Mining
Data Mining and Knowledge Discovery
Mining statistically significant substrings using the chi-square statistic
Proceedings of the VLDB Endowment
The long and the short of it: summarising event sequences with serial episodes
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Faster variance computation for patterns with gaps
MedAlg'12 Proceedings of the First Mediterranean conference on Design and Analysis of Algorithms
Mining high utility episodes in complex event sequences
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Review: A review of novelty detection
Signal Processing
Discovering episodes with compact minimal windows
Data Mining and Knowledge Discovery
Hi-index | 0.01 |
Suppose one wants to detect bad or suspicious subsequences in event sequences. Whether an observed pattern of activity (in the form of a particular subsequence) is significant and should be a cause for alarm depends on how likely it is to occur fortuitously. A long-enough sequence of observed events will almost certainly contain any subsequence, and setting thresholds for alarm is an important issue in a monitoring system that seeks to avoid false alarms. Suppose a long sequence, T, of observed events contains a suspicious subsequence pattern, S, within it, where the suspicious subsequence S consists of m events and spans a window of size w within T. We address the fundamental problem: Is a certain number of occurrences of a particular subsequence unlikely to be generated by randomness itself (i.e. indicative of suspicious activity)? If the probability of an occurrence generated by randomness is high and an automated monitoring system flags it as suspicious anyway, then such a system will suffer from generating too many false alarms. This paper quantifies the probability of such an S occurring in T within a window of size w, the number of distinct windows containing S as a subsequence, the expected number of such occurrences, its variance, and establishes its limiting distribution that allows setting up an alarm threshold so that the probability of false alarms is very small. We report on experiments confirming the theory and showing that we can detect bad subsequences with low false alarm rate.