Algorithms for clustering data
Algorithms for clustering data
On Clustering Validation Techniques
Journal of Intelligent Information Systems
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Stability-based validation of clustering solutions
Neural Computation
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Resampling Method for Unsupervised Estimation of Cluster Validity
Neural Computation
Nonnegative Decompositions with Resampling for Improving Gene Expression Data Biclustering Stability
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Methods to bicluster validation and comparison in microarray data
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Clustering and metaclustering with nonnegative matrix decompositions
ECML'05 Proceedings of the 16th European conference on Machine Learning
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Ensemble methods for biclustering tasks
Pattern Recognition
Hi-index | 0.01 |
Bicluster analysis is an unsupervised learning method to detect homogeneous or uniquely characterized two-way subsets of objects and attributes from a data set. It is useful in finding groups that may not be found by the traditional cluster analysis and in interpreting the groups intuitively, especially for high-dimensional data sets. Because of these advantages, over the last few years, various biclustering algorithms have been developed and applied to bioinformatics and text mining area. However, research into validation of bicluster solutions is rare. We propose a new procedure of validating bicluster solutions by developing a stability index to measure the reproducibility of the solution under variation in the input data set. By generating random resample data sets from the input data set, obtaining bicluster solutions from them, and evaluating the expected agreement of the solutions to the bicluster solution for the original input data set, we quantify the stability of the bicluster solution. Experiments using three artificial data sets and two real gene expression data sets indicate that the proposed method is suitable to validate bicluster solutions.