Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection

  • Authors:
  • Shin Ando

  • Affiliations:
  • -

  • Venue:
  • ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifying atypical objects is one of the traditional topics in machine learning. Recently, novel approaches, e.g., Minority Detection and One-class clustering, have explored further to identify clusters of atypical objects which strongly contrast from the rest of the data in terms of their distribution or density. This paper analyzes such tasks from an information theoretic perspective. Based on Information Bottleneck formalization, these tasks interpret to increasing the averaged atypicalness of the clusters while reducing the complexity of the clustering. This formalization yields a unifying view of the new approaches as well as the classic outlier detection. We also present a scalable minimization algorithm which exploits the localized form of the cost function over individual clusters. The proposed algorithm is evaluated using simulated datasets and a text classification benchmark, in comparison with an existing method.