Algorithms for clustering data
Algorithms for clustering data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Subspace clustering of text documents with feature weighting k-means algorithm
PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
IEEE Transactions on Knowledge and Data Engineering
Supplier categorization with K-means type subspace clustering
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
International Journal of Metadata, Semantics and Ontologies
International Journal of Metadata, Semantics and Ontologies
Hi-index | 0.00 |
Text clustering is an effective way of not only organizing textual information, but discovering interesting patterns. Most existing methods, however, suffer from two main drawbacks; they cannot provide an understandable representation for text clusters, and cannot scale to very large text collections. Highly scalable text clustering algorithms are becoming increasingly relevant. In this paper, we present a performance study of a new subspace clustering algorithm for large sparse text data. This algorithm automatically calculates the feature weights in the k-means clustering process. The feature weights are used to discover clusters from subspaces of the text vector space and identify terms that represent the semantics of the clusters. A series of experiments have been conducted to test the performance of the algorithm, including resource consumption and clustering quality. The experimental results on real-world text data have shown that our algorithm quickly converges to a local optimal solution and is scalable to the number of documents, terms and the number of clusters.