Finding key knowledge attribute subspace of outliers in high-dimensional dataset

Authors:
Biao Huang;Peng Yang
Affiliations:
College of Computer Science, Chongqing University of Arts and Sciences, Chongqing, China;College of Computer Science, Chongqing University of Arts and Sciences, Chongqing, China and College of Computer Science, Chongqing University, Chongqing, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 14
Cited 0

LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient algorithms for mining outliers from large data sets

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Two-phase clustering process for outliers detection

Pattern Recognition Letters
Findout: finding outliers in very large datasets

Knowledge and Information Systems
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Finding Intensional Knowledge of Distance-Based Outliers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Mining distance-based outliers in near linear time with randomization and a simple pruning rule

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Fast mining of distance-based outliers in high-dimensional datasets

Data Mining and Knowledge Discovery
On fuzzy approximation operators in attribute reduction with fuzzy rough sets

Information Sciences: an International Journal
Attribute reduction in decision-theoretic rough set models

Information Sciences: an International Journal
A fast approach to attribute reduction in incomplete decision systems with tolerance relation-based rough sets

Information Sciences: an International Journal
Variable-precision dominance-based rough set approach and attribute reduction

International Journal of Approximate Reasoning

Quantified Score

Hi-index	12.05

Visualization

Abstract

Outlier detection has important applications in many fields in which the data can contain high dimensions. However, finding the intentional knowledge of outliers will become inefficient and even infeasible in high dimensional space. In this paper, we introduced the concept of rough set and used it as the model of outlier detection and analysis system to realize outlying reduction. Furthermore, by defining outlying partition similarity, we can mine the outliers in the key knowledge attribute subspace rather than in the full dimensional attribute space of dataset. An effective method for finding the key knowledge attribute subspace was proposed. It first finds all outliers in the full attribute space and then, calculates KAS for corresponding projection of each outlier. Finally, the key knowledge attribute subspace can be identified by the value of outlying partition similarity. The experimental results show that our method can be efficiently used in high dimensional dataset to identify outlier.