A genetic approach for efficient outlier detection in projected space

  • Authors:
  • Sanghamitra Bandyopadhyay;Santanu Santra

  • Affiliations:
  • Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India;Machine Intelligence Unit, Indian Statistical Institute, Kolkata 700108, India

  • Venue:
  • Pattern Recognition
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper we present a genetic solution to the outlier detection problem. The essential idea behind this technique is to define outliers by examining those projections of the data, along which the data points have abnormal or inconsistent behavior (defined in terms of their sparsity values). We use a partitioning method to divide the data set into groups such that all the objects in a group can be considered to behave similarly. We then identify those groups that contain outliers. The algorithm assigns an 'outlier-ness' value that gives a relative measure of how strong an outlier group is. An evolutionary search computation technique is employed for determining those projections of the data over which the outliers can be identified. A new data structure, called the grid count tree (GCT), is used for efficient computation of the sparsity factor. GCT helps in quickly determining the number of points within any grid defined over the projected space and hence facilitates faster computation of the sparsity factor. A new crossover is also defined for this purpose. The proposed method is applicable for both numeric and categorical attributes. The search complexity of the GCT traversal algorithm is provided. Results are demonstrated for both artificial and real life data sets including four gene expression data sets.