Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Cumulated gain-based evaluation of IR techniques
ACM Transactions on Information Systems (TOIS)
Automating the Construction of Internet Portals with Machine Learning
Information Retrieval
A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering
Journal of Global Optimization
Cluster merging and splitting in hierarchical clustering algorithms
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Projected Gradient Methods for Nonnegative Matrix Factorization
Neural Computation
Introduction to Information Retrieval
Introduction to Information Retrieval
SIAM Journal on Matrix Analysis and Applications
Toward Faster Nonnegative Matrix Factorization: A New Algorithm and Comparisons
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Optimizing semantic coherence in topic models
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
On the convergence of the block nonlinear Gauss-Seidel method under convex constraints
Operations Research Letters
Hi-index | 0.00 |
Nonnegative matrix factorization (NMF) has been successfully used as a clustering method especially for flat partitioning of documents. In this paper, we propose an efficient hierarchical document clustering method based on a new algorithm for rank-2 NMF. When the two block coordinate descent framework of nonnegative least squares is applied to computing rank-2 NMF, each subproblem requires a solution for nonnegative least squares with only two columns in the matrix. We design the algorithm for rank-2 NMF by exploiting the fact that an exhaustive search for the optimal active set can be performed extremely fast when solving these NNLS problems. In addition, we design a measure based on the results of rank-2 NMF for determining which leaf node should be further split. On a number of text data sets, our proposed method produces high-quality tree structures in significantly less time compared to other methods such as hierarchical K-means, standard NMF, and latent Dirichlet allocation.