Fast rank-2 nonnegative matrix factorization for hierarchical document clustering

Authors:
Da Kuang;Haesun Park
Affiliations:
Georgia Institute of Technology, Atlanta, GA, USA;Georgia Institute of Technology, Atlanta, GA, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 15
Cited 0

Recent trends in hierarchic document clustering: a critical review

Information Processing and Management: an International Journal
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

Journal of Global Optimization
Cluster merging and splitting in hierarchical clustering algorithms

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Projected Gradient Methods for Nonnegative Matrix Factorization

Neural Computation
Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis

Bioinformatics
Introduction to Information Retrieval

Introduction to Information Retrieval
Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method

SIAM Journal on Matrix Analysis and Applications
Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
On the convergence of the block nonlinear Gauss-Seidel method under convex constraints

Operations Research Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nonnegative matrix factorization (NMF) has been successfully used as a clustering method especially for flat partitioning of documents. In this paper, we propose an efficient hierarchical document clustering method based on a new algorithm for rank-2 NMF. When the two block coordinate descent framework of nonnegative least squares is applied to computing rank-2 NMF, each subproblem requires a solution for nonnegative least squares with only two columns in the matrix. We design the algorithm for rank-2 NMF by exploiting the fact that an exhaustive search for the optimal active set can be performed extremely fast when solving these NNLS problems. In addition, we design a measure based on the results of rank-2 NMF for determining which leaf node should be further split. On a number of text data sets, our proposed method produces high-quality tree structures in significantly less time compared to other methods such as hierarchical K-means, standard NMF, and latent Dirichlet allocation.