Binary matrix factorization for analyzing gene expression data

Authors:
Zhong-Yuan Zhang;Tao Li;Chris Ding;Xian-Wen Ren;Xiang-Sun Zhang
Affiliations:
School of Statistics, Central University of Finance and Economics, Beijing, People's Republic of China;School of Computing and Information Sciences, Florida International University, Miami, USA;Department of Computer Science and Engineering, University of Texas, Arlington, USA;Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, People's Republic of China;Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, People's Republic of China
Venue:
Data Mining and Knowledge Discovery
Year:
2010

Citing 17
Cited 1

Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Non-negative Matrix Factorization with Sparseness Constraints

The Journal of Machine Learning Research
Relation between PLSA and NMF and implications

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
A general model for clustering binary data

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Defining transcription modules using large-scale gene expression data

Bioinformatics
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis

ACM Transactions on Mathematical Software (TOMS)
Discriminative cluster analysis

ICML '06 Proceedings of the 23rd international conference on Machine learning
A systematic comparison and evaluation of biclustering methods for gene expression data

Bioinformatics
Multiplicative Updates for Nonnegative Quadratic Programming

Neural Computation
Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis

Bioinformatics
Binary Matrix Factorization with Applications

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence, chi-square statistic, and a hybrid method

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1

A hierarchical model for ordinal matrix factorization

Statistics and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The advent of microarray technology enables us to monitor an entire genome in a single chip using a systematic approach. Clustering, as a widely used data mining approach, has been used to discover phenotypes from the raw expression data. However traditional clustering algorithms have limitations since they can not identify the substructures of samples and features hidden behind the data. Different from clustering, biclustering is a new methodology for discovering genes that are highly related to a subset of samples. Several biclustering models/methods have been presented and used for tumor clinical diagnosis and pathological research. In this paper, we present a new biclustering model using Binary Matrix Factorization (BMF). BMF is a new variant rooted from non-negative matrix factorization (NMF). We begin by proving a new boundedness property of NMF. Two different algorithms to implement the model and their comparison are then presented. We show that the microarray data biclustering problem can be formulated as a BMF problem and can be solved effectively using our proposed algorithms. Unlike the greedy strategy-based algorithms, our proposed algorithms for BMF are more likely to find the global optima. Experimental results on synthetic and real datasets demonstrate the advantages of BMF over existing biclustering methods. Besides the attractive clustering performance, BMF can generate sparse results (i.e., the number of genes/features involved in each biclustering structure is very small related to the total number of genes/features) that are in accordance with the common practice in molecular biology.