Example-based robust DB-outlier detection for high dimensional data

Authors:
Yuan Li;Hiroyuki Kitagawa
Affiliations:
Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan;Center for Computational Sciences, University of Tsukuba, Tsukuba, Ibaraki, Japan
Venue:
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Year:
2008

Citing 6
Cited 0

LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
An effective and efficient algorithm for high-dimensional outlier detection

The VLDB Journal — The International Journal on Very Large Data Bases
DB-Outlier Detection by Example in High Dimensional Datasets

SWOD '07 Proceedings of the 2007 IEEE International Workshop on Databases for Next Generation Researchers

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method of outlier detection to identify exceptional objects that match user intentions in high dimensional datasets. Outlier detection is a crucial element of many applications like financial analysis and fraud detection. Scholars have made numerous investigations, but the results show that current methods fail to directly discover outliers from high dimensional datasets due to the curse of dimensionality. Beyond that, many algorithms require several decisive parameters to be predefined. Such vital parameters are considerably difficult to determine without identifying datasets beforehand. To address these problems, we take an Example-Based approach and examine behaviors of projections of the outlier examples in a dataset. An example-based approach is promising, since users are probably able to provide a few outlier examples to suggest what they want to detect. An important point is that the method should be robust, even if user-provided examples include noises or inconsistencies. Our proposed method is based on the notion of DB- (Distance-Based) Outliers. Experiments demonstrate that our proposed method is effective and efficient on both synthetic and real datasets and can tolerate noise examples.