Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Floating search methods in feature selection
Pattern Recognition Letters
Hierarchic document classification using Ward's clustering method
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Projections for efficient document clustering
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Bipartite graph partitioning and data clustering
Proceedings of the tenth international conference on Information and knowledge management
Unsupervised document classification using sequential information maximization
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering with cluster refinement and model selection capabilities
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering with committees
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Model Selection in Unsupervised Learning with Applications To Document Clustering
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature Subset Selection and Order Identification for Unsupervised Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Feature Weighting in k-Means Clustering
Machine Learning
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Resampling Method for Unsupervised Estimation of Cluster Validity
Neural Computation
Document re-ranking using cluster validation and label propagation
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Chinese multi-document summarization using adaptive clustering and global search strategy
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
An optimized k-means algorithm of reducing cluster intra-dissimilarity for document clustering
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Multi-document summarization using a clustering-based hybrid strategy
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Finding the optimal cardinality value for information bottleneck method
ADMA'06 Proceedings of the Second international conference on Advanced Data Mining and Applications
Automatic relation extraction with model order selection and discriminative label identification
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Hi-index | 0.00 |
This paper presents a cluster validation based document clustering algorithm, which is capable of identifying both important feature words and true model order (cluster number). Important feature subset is selected by optimizing a cluster validity criterion subject to some constraint. For achieving model order identification capability, this feature selection procedure is conducted for each possible value of cluster number. The feature subset and cluster number which maximize the cluster validity criterion are chosen as our answer. We have applied our algorithm to several datasets from 20Newsgroup corpus. Experimental results show that our algorithm can find important feature subset, estimate the model order and yield higher micro-averaged precision than other four document clustering algorithms which require cluster number to be provided.