Anomaly Detection over Noisy Data using Learned Probability Distributions
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Rule-based anomaly pattern detection for detecting disease outbreaks
Eighteenth national conference on Artificial intelligence
Detecting anomalous records in categorical datasets
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Cached sufficient statistics for efficient machine learning with large datasets
Journal of Artificial Intelligence Research
A statistically based system for prioritizing information exploration under uncertainty
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A concise representation of association rules using minimal predictive rules
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
A fast calculation of metric scores for learning Bayesian network
International Journal of Automation and Computing
Mining coherent anomaly collections on web data
Proceedings of the 21st ACM international conference on Information and knowledge management
Hidden Source Behavior Change Tracking and Detection
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Fast generalized subset scan for anomalous pattern detection
The Journal of Machine Learning Research
Hi-index | 0.00 |
We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a "local anomaly detector" to identify individual records with anomalous attribute values, and then detect patterns where the number of anomalous records is higher than expected. Given the set of anomalies flagged by the local anomaly detector, we search over all subsets of the data defined by any set of fixed values of a subset of the attributes, in order to detect self-similar patterns of anomalies. We wish to detect any such subset of the test data which displays a significant increase in anomalous activity as compared to the normal behavior of the system (as indicated by the training data). We perform significance testing to determine if the number of anomalies in any subset of the test data is significantly higher than expected, and propose an efficient algorithm to perform this test over all such subsets of the data. We show that this algorithm is able to accurately detect anomalous patterns in real-world hospital, container shipping and network intrusion data.