An introduction to Kolmogorov complexity and its applications
An introduction to Kolmogorov complexity and its applications
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A symbolic representation of time series, with implications for streaming algorithms
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Detecting anomalous records in categorical datasets
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast mining of distance-based outliers in high-dimensional datasets
Data Mining and Knowledge Discovery
IEEE Transactions on Knowledge and Data Engineering
Finding Good Itemsets by Packing Data
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Krimp: mining itemsets that compress
Data Mining and Knowledge Discovery
OddBall: spotting anomalies in weighted graphs
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Anomaly Detection for Discrete Sequences: A Survey
IEEE Transactions on Knowledge and Data Engineering
Paper: Modeling by shortest data description
Automatica (Journal of IFAC)
Summarizing categorical data by clustering attributes
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Spotting anomalies in large multi-dimensional databases is a crucial task with many applications in finance, health care, security, etc. We introduce COMPREX, a new approach for identifying anomalies using pattern-based compression. Informally, our method finds a collection of dictionaries that describe the norm of a database succinctly, and subsequently flags those points dissimilar to the norm---with high compression cost---as anomalies. Our approach exhibits four key features: 1) it is parameter-free; it builds dictionaries directly from data, and requires no user-specified parameters such as distance functions or density and similarity thresholds, 2) it is general; we show it works for a broad range of complex databases, including graph, image and relational databases that may contain both categorical and numerical features, 3) it is scalable; its running time grows linearly with respect to both database size as well as number of dimensions, and 4) it is effective; experiments on a broad range of datasets show large improvements in both compression, as well as precision in anomaly detection, outperforming its state-of-the-art competitors.