From cluster ensemble to structure ensemble

  • Authors:
  • Zhiwen Yu;Jane You;Hau-San Wong;Guoqiang Han

  • Affiliations:
  • School of Computer Science and Engineering, South China University of Technology, China and Department of Computing, Hong Kong Polytechnic University, Hong Kong;Department of Computing, Hong Kong Polytechnic University, Hong Kong;Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong;School of Computer Science and Engineering, South China University of Technology, China

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

This paper investigates the problem of integrating multiple structures which are extracted from different sets of data points into a single unified structure. We first propose a new generalized concept called structure ensemble for the fusion of multiple structures. Unlike traditional cluster ensemble approaches the main objective of which is to align individual labels obtained from different clustering solutions, the structure ensemble approach focuses on how to unify the structures obtained from different data sources. Based on this framework, a new structure ensemble approach called the probabilistic bagging based structure ensemble approach (BSEA) is designed, which integrates the bagging technique, the force based self-organizing map (FBSOM) and the normalized cut algorithm into the proposed framework. BSEA views structures obtained from different datasets generated by the bagging technique as nodes in a graph, and adopts graph theory to find the most representative structure. In addition, the force based self-organizing map (FBSOM), which is a generalized form of SOM, is proposed to serve as the basic clustering algorithm in the structure ensemble framework. Finally, a new external index called correlation index (CI), which considers the correlation relationship of both the similarity and dissimilarity between the predicted solution and the true solution, is proposed to evaluate the performance of BSEA. The experiments show that (i) The performance of BSEA outperforms most of the state-of-the-art clustering approaches, and (ii) BSEA performs well on datasets from the UCI repository and real cancer gene expression profiles.