Discovering local structure in gene expression data: the order-preserving submatrix problem
Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Non-negative Matrix Factorization with Sparseness Constraints
The Journal of Machine Learning Research
Relation between PLSA and NMF and implications
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A general model for clustering binary data
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis
ACM Transactions on Mathematical Software (TOMS)
Discriminative cluster analysis
ICML '06 Proceedings of the 23rd international conference on Machine learning
Multiplicative Updates for Nonnegative Quadratic Programming
Neural Computation
Binary Matrix Factorization with Applications
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
A hierarchical model for ordinal matrix factorization
Statistics and Computing
Hi-index | 0.00 |
The advent of microarray technology enables us to monitor an entire genome in a single chip using a systematic approach. Clustering, as a widely used data mining approach, has been used to discover phenotypes from the raw expression data. However traditional clustering algorithms have limitations since they can not identify the substructures of samples and features hidden behind the data. Different from clustering, biclustering is a new methodology for discovering genes that are highly related to a subset of samples. Several biclustering models/methods have been presented and used for tumor clinical diagnosis and pathological research. In this paper, we present a new biclustering model using Binary Matrix Factorization (BMF). BMF is a new variant rooted from non-negative matrix factorization (NMF). We begin by proving a new boundedness property of NMF. Two different algorithms to implement the model and their comparison are then presented. We show that the microarray data biclustering problem can be formulated as a BMF problem and can be solved effectively using our proposed algorithms. Unlike the greedy strategy-based algorithms, our proposed algorithms for BMF are more likely to find the global optima. Experimental results on synthetic and real datasets demonstrate the advantages of BMF over existing biclustering methods. Besides the attractive clustering performance, BMF can generate sparse results (i.e., the number of genes/features involved in each biclustering structure is very small related to the total number of genes/features) that are in accordance with the common practice in molecular biology.