Stemming and its effects on TFIDF ranking (poster session)
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Document Clustering Using the 1 + 1 Dimensional Self-Organising Map
IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Feature Weighting in k-Means Clustering
Machine Learning
Hybrid Neural Document Clustering Using Guided Self-Organization and WordNet
IEEE Intelligent Systems
A Maximal Frequent Itemset Approach for Web Document Clustering
CIT '04 Proceedings of the The Fourth International Conference on Computer and Information Technology
An Immune Network Approach for Web Document Clustering
WI '04 Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence
Document clustering based on cluster validation
Proceedings of the thirteenth ACM international conference on Information and knowledge management
ESPClust: an effective skew prevention method for model-based document clustering
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Sociomapping in Text Retrieval Systems
FQAS '09 Proceedings of the 8th International Conference on Flexible Query Answering Systems
Hi-index | 0.00 |
Due to the high-dimension and sparseness properties of documents, clustering the similar documents together is a tough task. The most popular document clustering method K-Means has the shortcoming of its cluster intra-dissimilarity, i.e. inclining to clustering unrelated documents together. One of the reasons is that all objects (documents) in a cluster produce the same influence to the mean of the cluster. SOM (Self Organizing Map) is a method to reduce the dimension of data and display the data in low dimension space, and it has been applied successfully to clustering of high-dimensional objects. The scalar factor is an important part of SOM. In this paper, an optimized K-Means algorithm is proposed. It introduces the scalar factor from SOM into means during K-Means assignment stage for controlling the influence to the means from new objects. Experiments show that the optimized K-Means algorithm has more F-Measure and less Entropy of clustering than standard K-Means algorithm, thereby reduces the intra-dissimilarity of clusters effectively.