Stability-based validation of bicluster solutions

  • Authors:
  • Youngrok Lee;Jeonghwa Lee;Chi-Hyuck Jun

  • Affiliations:
  • Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang 790-784, Republic of Korea;Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang 790-784, Republic of Korea;Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang 790-784, Republic of Korea

  • Venue:
  • Pattern Recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Bicluster analysis is an unsupervised learning method to detect homogeneous or uniquely characterized two-way subsets of objects and attributes from a data set. It is useful in finding groups that may not be found by the traditional cluster analysis and in interpreting the groups intuitively, especially for high-dimensional data sets. Because of these advantages, over the last few years, various biclustering algorithms have been developed and applied to bioinformatics and text mining area. However, research into validation of bicluster solutions is rare. We propose a new procedure of validating bicluster solutions by developing a stability index to measure the reproducibility of the solution under variation in the input data set. By generating random resample data sets from the input data set, obtaining bicluster solutions from them, and evaluating the expected agreement of the solutions to the bicluster solution for the original input data set, we quantify the stability of the bicluster solution. Experiments using three artificial data sets and two real gene expression data sets indicate that the proposed method is suitable to validate bicluster solutions.