Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining outliers with ensemble of heterogeneous detectors on random subspaces
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Unsupervised ensemble learning for mining top-n outliers
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Outlier ensembles: position paper
ACM SIGKDD Explorations Newsletter
On the combination of relative clustering validity criteria
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Subsampling for efficient and effective unsupervised outlier detection ensembles
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining and Knowledge Discovery
Ensembles for unsupervised outlier detection: challenges and research questions a position paper
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
Current outlier detection schemes typically output a numeric score representing the degree to which a given observation is an outlier. We argue that converting the scores into well-calibrated probability estimates is more favorable for several reasons. First, the probability estimates allow us to select the appropriate threshold for declaring outliers using a Bayesian risk model. Second, the probability estimates obtained from individual models can be aggregated to build an ensemble outlier detection framework. In this paper, we present two methods for transforming outlier scores into probabilities. The first approach assumes that the posterior probabilities follow a logistic sigmoid function and learns the parameters of the function from the distribution of outlier scores. The second approach models the score distributions as a mixture of exponential and Gaussian probability functions and calculates the posterior probabilites via the Bayes' rule. We evaluated the efficacy of both methods in the context of threshold selection and ensemble outlier detection. We also show that the calibration accuracy improves with the aid of some labeled examples.