Introduction to algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms for the Longest Common Subsequence Problem
Journal of the ACM (JACM)
Topology of strings: median string is NP-complete
Theoretical Computer Science
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A fast algorithm for computing longest common subsequences
Communications of the ACM
A linear space algorithm for computing maximal common subsequences
Communications of the ACM
CLARANS: A Method for Clustering Objects for Spatial Data Mining
IEEE Transactions on Knowledge and Data Engineering
A Scalable Algorithm for Clustering Sequential Data
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Clustering Techniques in Biological Sequence Analysis
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
ADMIT: anomaly-based data mining for intrusions
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Longest Common Subsequence Algorithms
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Sequential Data Mining: A Comparative Case Study in Development of Atherosclerosis Risk Factors
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
We present a set of novel algorithms which we call sequence Miner that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms that we present are general and domain-independent, we focus on a specific problem that is critical to determining the system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of the longest common subsequence as a similarity measure, followed by detailed outlier analysis to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from the cluster center. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. In the final section of the paper, we demonstrate the effectiveness of sequence Miner for anomaly detection on a real set of discrete-sequence data from a fleet of commercial airliners. We show that sequence Miner discovers actionable and operationally significant safety events. We also compare our innovations with standard Hidden Markov Models, and show that our methods are superior.