Ensemble clustering with voting active clusters

  • Authors:
  • Kagan Tumer;Adrian K. Agogino

  • Affiliations:
  • Oregon State University, 204 Rogers Hall, Corvallis, OR 97330, USA;UCSC, NASA Ames Res. Ctr., Mailstop 269-3, Moffett Field, CA 94035, USA

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.10

Visualization

Abstract

Clustering is an integral part of pattern recognition problems and is connected to both the data reduction and the data understanding steps. Combining multiple clusterings into an ensemble clustering is critical in many real world applications, particularly for domains with large data sets, high-dimensional feature sets and proprietary data. This paper presents voting active clusters (VACs), a method for combining multiple ''base'' clusterings into a single unified ''ensemble'' clustering that is robust against missing data and does not require all the data to be collected in one central location. In this approach, separate processing centers produce many base clusterings based on some portion of the data. The clusterings of such separate processing centers are then pooled to produce a unified ensemble clustering through a voting mechanism. The major contribution of this work is in providing an adaptive voting method by which the clusterings (e.g., spatially distributed processing centers) update their votes in order to maximize an overall quality measure. Our results show that this method achieves comparable or better performance than traditional cluster ensemble methods in noise-free conditions, and remains effective in noisy scenarios where many traditional methods are inapplicable.