Fast text searching: allowing errors
Communications of the ACM
Text algorithms
An introduction to the analysis of algorithms
An introduction to the analysis of algorithms
Matching a set of strings with variable length don't cares
Theoretical Computer Science
Window-accumulated subsequence matching problem is linear
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Compact recognizers of episode sequences
Information and Computation
Discovery of Frequent Episodes in Event Sequences
Data Mining and Knowledge Discovery
ICALP '01 Proceedings of the 28th International Colloquium on Automata, Languages and Programming,
CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Fixed- vs. variable-length patterns for detecting suspicious process behavior
Journal of Computer Security
Video Data Mining: Semantic Indexing and Event Detection from the Association Perspective
IEEE Transactions on Knowledge and Data Engineering
Discovering Frequent Episodes and Learning Hidden Markov Models: A Formal Connection
IEEE Transactions on Knowledge and Data Engineering
Journal of the ACM (JACM)
Frequency-based views to pattern collections
Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
A fast algorithm for finding frequent episodes in event streams
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient mining of frequent episodes from complex sequences
Information Systems
Frequency-based views to pattern collections
Discrete Applied Mathematics - Special issue: Discrete mathematics & data mining II (DM & DM II)
Mining actionable partial orders in collections of sequences
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Processing count queries over event streams at multiple time granularities
Information Sciences: an International Journal
Ranking sequential patterns with respect to significance
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Mining closed episodes from event sequences efficiently
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Discovering lag intervals for temporal dependencies
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy preserving monitoring and surveillance in sensor networks
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.00 |
Suppose one wants to detect "bad" or "suspicious" subsequencesin event sequences.Whether an observed patternof activity (in the form of a particular subsequence) is significantand should be a cause for alarm, depends on howlikely it is to occur fortuitously.A long enough sequenceof observed events will almost certainly contain any subsequence,and setting thresholds for alarm is an important issuein a monitoring system that seeks to avoid false alarms.Suppose a long sequence T of observed events contains asuspicious subsequence pattern S within it, where the suspicioussubsequence S consists of m events and spans a windowof size w within T.We address the fundamental problem:is a certain number of occurrences of a particular subsequenceunlikely to be fortuitous (i.e., indicative of suspiciousactivity)?If the probability of fortuitous occurrencesis high and an automated monitoring system flags it as suspiciousanyway, then such a system will suffer from generatingtoo many false alarms.This paper quantifies the probabilityof such an S occuring in T within a window of sizew, the number of distinct windows containing S as a subsequence,the expected number of such occurrences, its variance,and establishes its limiting distribution that allows toset up an alarm threshold so that the probability of falsealarms is very small.We report on experiments confirmingthe theory and showing that we can detect bad subsequenceswith low false alarm rate.