LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Adaptive Ripple Down Rules Method based on Minimum Description Length Principle
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On Digital Money and Card Technologies
On Digital Money and Card Technologies
Towards parameter-free data mining
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust information-theoretic clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Simultaneous Classification and VisualWord Selection using Entropy-based Minimum Description Length
ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
Outlier-robust clustering using independent components
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
IEEE Transactions on Information Theory
Spatially adaptive wavelet denoising using the minimum description length principle
IEEE Transactions on Image Processing
Synchronization based outlier detection
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Parameter-free anomaly detection for categorical data
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Measuring non-gaussianity by phi-transformed and fuzzy histograms
Advances in Artificial Neural Systems - Special issue on Advances in Unsupervised Learning Techniques Applied to Biosciences and Medicine
Outlier detection using centrality and center-proximity
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
How can we automatically spot all outstanding observations in a data set? This question arises in a large variety of applications, e.g. in economy, biology and medicine. Existing approaches to outlier detection suffer from one or more of the following drawbacks: The results of many methods strongly depend on suitable parameter settings being very difficult to estimate without background knowledge on the data, e.g. the minimum cluster size or the number of desired outliers. Many methods implicitly assume Gaussian or uniformly distributed data, and/or their result is difficult to interpret. To cope with these problems, we propose CoCo, a technique for parameter-free outlier detection. The basic idea of our technique relates outlier detection to data compression: Outliers are objects which can not be effectively compressed given the data set. To avoid the assumption of a certain data distribution, CoCo relies on a very general data model combining the Exponential Power Distribution with Independent Components. We define an intuitive outlier factor based on the principle of the Minimum Description Length together with an novel algorithm for outlier detection. An extensive experimental evaluation on synthetic and real world data demonstrates the benefits of our technique. Availability: The source code of CoCo and the data sets used in the experiments are available at: http://www.dbs.ifi.lmu.de/Forschung/KDD/Boehm/CoCo.