Unsupervised ensemble learning for mining top-n outliers

Authors:
Jun Gao;Weiming Hu;Zhongfei(Mark) Zhang;Ou Wu
Affiliations:
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China;National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China;Dept. of Computer Science, State Univ. of New York at Binghamton, Binghamton, NY;National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Venue:
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Year:
2012

Citing 11
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Random Forests

Machine Learning
Cranking: Combining Rankings Using Conditional Probability Models on Permutations

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Distance-based outliers: algorithms and applications

The VLDB Journal — The International Journal on Very Large Data Bases
Feature bagging for outlier detection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Outlier detection by active learning

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Converting Output Scores from Outlier Detection Algorithms into Probability Estimates

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Unsupervised rank aggregation with distance-based models

Proceedings of the 25th international conference on Machine learning
Local peculiarity factor and its application in outlier detection

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
RKOF: robust kernel-based local outlier detection

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Outlier detection is an important and attractive problem in knowledge discovery in large datasets. Instead of detecting an object as an outlier, we study detecting the n most outstanding outliers, i.e. the top-n outlier detection. Further, we consider the problem of combining the top-n outlier lists from various individual detection methods. A general framework of ensemble learning in the top-n outlier detection is proposed based on the rank aggregation techniques. A score-based aggregation approach with the normalization method of outlier scores and an order-based aggregation approach based on the distance-based Mallows model are proposed to accommodate various scales and characteristics of outlier scores from different detection methods. Extensive experiments on several real datasets demonstrate that the proposed approaches always deliver a stable and effective performance independent of different datasets in a good scalability in comparison with the state-of-the-art literature.