Non-redundant clustering

  • Authors:
  • Thomas Hofmann;David Gondek

  • Affiliations:
  • Brown University;Brown University

  • Venue:
  • Non-redundant clustering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining and knowledge discovery attempt to reveal concepts, patterns, relationships, and structures of interest in data. Typically, data may have many such structures. Most existing data mining techniques allow the user little say in which structure will be returned from the search. Those techniques which do allow the user control over the search typically require supervised information in the form of knowledge about a target solution. In the spirit of exploratory data mining, we consider the setting where the user does not have information about a target solution. Instead we suppose the user can provide information about solutions which are not desired . These undesired solutions may be previously obtained from data mining algorithms, or they may be known to the user a priori. The goal is then to discover novel structure in the dataset which is not redundant with respect to the known structure. Techniques should guide the search away from this known structure and towards novel, interesting structures. We describe and formally define the task of non-redundant clustering. Three different algorithmic approaches are derived for non-redundant clustering. Their performance is experimentally evaluated on data sets containing multiple clusterings. We explore how these techniques may be extended to systematically enumerate clusterings in a data set. Finally, we also investigate whether non-redundant approaches may be incorporated to enhance state-of-the-art supervised techniques.