Fast coordinate descent methods with variable selection for non-negative matrix factorization

Authors:
Cho-Jui Hsieh;Inderjit S. Dhillon
Affiliations:
University of Texas at Austin, Austin, TX, USA;University of Texas at Austin, Austin, TX, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 10
Cited 7

Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Grafting: fast, incremental feature selection by gradient descent in function space

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Non-negative Matrix Factorization with Sparseness Constraints

The Journal of Machine Learning Research
Relation between PLSA and NMF and implications

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Projected Gradient Methods for Nonnegative Matrix Factorization

Neural Computation
Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method

SIAM Journal on Matrix Analysis and Applications
Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Non-negative matrix factorization with quasi-newton optimization

ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
On the convergence of the block nonlinear Gauss-Seidel method under convex constraints

Operations Research Letters

Fast bregman divergence NMF using taylor expansion and coordinate descent

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Overlapping community detection via bounded nonnegative matrix tri-factorization

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Overlapping community detection at scale: a nonnegative matrix factorization approach

Proceedings of the sixth ACM international conference on Web search and data mining
Multiple graph regularized nonnegative matrix factorization

Pattern Recognition
Discovering latent blockmodels in sparse and noisy graphs using non-negative matrix factorisation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Detecting cohesive and 2-mode communities indirected and undirected networks

Proceedings of the 7th ACM international conference on Web search and data mining
Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework

Journal of Global Optimization

Quantified Score

Hi-index	0.01

Visualization

Abstract

Nonnegative Matrix Factorization (NMF) is an effective dimension reduction method for non-negative dyadic data, and has proven to be useful in many areas, such as text mining, bioinformatics and image processing. NMF is usually formulated as a constrained non-convex optimization problem, and many algorithms have been developed for solving it. Recently, a coordinate descent method, called FastHals, has been proposed to solve least squares NMF and is regarded as one of the state-of-the-art techniques for the problem. In this paper, we first show that FastHals has an inefficiency in that it uses a cyclic coordinate descent scheme and thus, performs unneeded descent steps on unimportant variables. We then present a variable selection scheme that uses the gradient of the objective function to arrive at a new coordinate descent method. Our new method is considerably faster in practice and we show that it has theoretical convergence guarantees. Moreover when the solution is sparse, as is often the case in real applications, our new method benefits by selecting important variables to update more often, thus resulting in higher speed. As an example, on a text dataset RCV1, our method is 7 times faster than FastHals, and more than 15 times faster when the sparsity is increased by adding an L1 penalty. We also develop new coordinate descent methods when error in NMF is measured by KL-divergence by applying the Newton method to solve the one-variable sub-problems. Experiments indicate that our algorithm for minimizing the KL-divergence is faster than the Lee & Seung multiplicative rule by a factor of 10 on the CBCL image dataset.