Finding key knowledge attribute subspace of outliers in high-dimensional dataset

  • Authors:
  • Biao Huang;Peng Yang

  • Affiliations:
  • College of Computer Science, Chongqing University of Arts and Sciences, Chongqing, China;College of Computer Science, Chongqing University of Arts and Sciences, Chongqing, China and College of Computer Science, Chongqing University, Chongqing, China

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

Outlier detection has important applications in many fields in which the data can contain high dimensions. However, finding the intentional knowledge of outliers will become inefficient and even infeasible in high dimensional space. In this paper, we introduced the concept of rough set and used it as the model of outlier detection and analysis system to realize outlying reduction. Furthermore, by defining outlying partition similarity, we can mine the outliers in the key knowledge attribute subspace rather than in the full dimensional attribute space of dataset. An effective method for finding the key knowledge attribute subspace was proposed. It first finds all outliers in the full attribute space and then, calculates KAS for corresponding projection of each outlier. Finally, the key knowledge attribute subspace can be identified by the value of outlying partition similarity. The experimental results show that our method can be efficiently used in high dimensional dataset to identify outlier.