Concept decompositions for large sparse text data using clustering
Machine Learning
Machine Learning
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
IEEE Transactions on Knowledge and Data Engineering
A clustering framework based on subjective and objective validity criteria
ACM Transactions on Knowledge Discovery from Data (TKDD)
ISMCS: an intelligent instruction sequence based malware categorization system
ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
SKM-SNP: SNP markers detection method
Journal of Biomedical Informatics
Document clustering using synthetic cluster prototypes
Data & Knowledge Engineering
An entropy weighting mixture model for subspace clustering of high-dimensional data
Pattern Recognition Letters
Document clustering based on maximal frequent sequences
FinTAL'06 Proceedings of the 5th international conference on Advances in Natural Language Processing
On the performance of feature weighting K-means for text subspace clustering
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Text clustering with limited user feedback under local metric learning
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Feature interaction in subspace clustering using the Choquet integral
Pattern Recognition
Partitive clustering (K-means family)
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Proceedings of the 2nd International Conference on Learning Analytics and Knowledge
The dictionary-based quantified conceptual relations for hard and soft Chinese text clustering
NLDB'07 Proceedings of the 12th international conference on Applications of Natural Language to Information Systems
Fuzzy partition based soft subspace clustering and its applications in high dimensional data
Information Sciences: an International Journal
Hi-index | 0.01 |
This paper presents a new method to solve the problem of clustering large and complex text data. The method is based on a new subspace clustering algorithm that automatically calculates the feature weights in the k-means clustering process. In clustering sparse text data the feature weights are used to discover clusters from subspaces of the document vector space and identify key words that represent the semantics of the clusters. We present a modification of the published algorithm to solve the sparsity problem that occurs in text clustering. Experimental results on real-world text data have shown that the new method outperformed the Standard KMeans and Bisection-KMeans algorithms, while still maintaining efficiency of the k-means clustering process.