Recent trends in hierarchic document clustering: a critical review
Information Processing and Management: an International Journal
Identifying word correspondence in parallel texts
HLT '91 Proceedings of the workshop on Speech and Natural Language
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
A study of retrospective and on-line event detection
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
On-line new event detection and tracking
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Distributional clustering of words for text classification
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Distributional clustering of English words
ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Document clustering by concept factorization
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Tracking dynamics of topic trends using a finite mixture model
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering based on cluster validation
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Text summarization using a trainable summarizer and latent semantic analysis
Information Processing and Management: an International Journal - Special issue: An Asian digital libraries perspective
An initial evaluation of automated organization for digital library browsing
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Document Clustering Using Locality Preserving Indexing
IEEE Transactions on Knowledge and Data Engineering
Adaptive topological tree structure for document organisation and visualisation
Neural Networks - 2004 Special issue: New developments in self-organizing systems
Information Processing and Management: an International Journal
Regularized clustering for documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Utilizing phrase-similarity measures for detecting and clustering informative RSS news articles
Integrated Computer-Aided Engineering
Multinomial mixture model with feature selection for text clustering
Knowledge-Based Systems
Adaptive Multimedial Retrieval: Retrieval, User, and Semantics
Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior
Advanced Web and NetworkTechnologies, and Applications
Expert Systems with Applications: An International Journal
Semi-supervised Document Clustering with Simultaneous Text Representation and Categorization
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Building an automatic annotate image system by using bootstrapping
CATE '07 Proceedings of the 10th IASTED International Conference on Computers and Advanced Technology in Education
A Clustering Framework Based on Adaptive Space Mapping and Rescaling
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Document Clustering with Cluster Refinement and Non-negative Matrix Factorization
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Automatic taxonomy generation: issues and possibilities
IFSA'03 Proceedings of the 10th international fuzzy systems association World Congress conference on Fuzzy sets and systems
Person name disambiguation by bootstrapping
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Document clustering using NMF and fuzzy relation
Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Integrating Document Clustering and Multidocument Summarization
ACM Transactions on Knowledge Discovery from Data (TKDD)
Double-pass clustering technique for multilingual document collections
Journal of Information Science
Document clustering with universum
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Multi-view transfer learning with a large margin approach
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Serendipitous learning: learning beyond the predefined label space
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Clique percolation method for finding naturally cohesive and overlapping document clusters
ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead
Non-negative matrix factorization based text mining: feature extraction and classification
ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Natural document clustering by clique percolation in random graphs
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Dynamic pattern mining: an incremental data clustering approach
Journal on Data Semantics II
Leveraging network structure for incremental document clustering
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Sentiment detection with auxiliary data
Information Retrieval
Journal of Intelligent Information Systems
On Knowledge-Enhanced Document Clustering
International Journal of Information Retrieval Research
Hi-index | 0.00 |
In this paper, we propose a document clustering method that strives to achieve: (1) a high accuracy of document clustering, and (2) the capability of estimating the number of clusters in the document corpus (i.e. the model selection capability). To accurately cluster the given document corpus, we employ a richer feature set to represent each document, and use the Gaussian Mixture Model (GMM) together with the Expectation-Maximization (EM) algorithm to conduct an initial document clustering. From this initial result, we identify a set of discriminative featuresfor each cluster, and refine the initially obtained document clusters by voting on the cluster label of each document using this discriminative feature set. This self-refinement process of discriminative feature identification and cluster label voting is iteratively applied until the convergence of document clusters. On the other hand, the model selection capability is achieved by introducing randomness in the cluster initialization stage, and then discovering a value C for the number of clusters N by which running the document clustering process for a fixed number of times yields sufficiently similar results. Performance evaluations exhibit clear superiority of the proposed method with its improved document clustering and model selection accuracies. The evaluations also demonstrate how each feature as well as the cluster refinement process contribute to the document clustering accuracy.