Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Scatter/Gather: a cluster-based approach to browsing large document collections
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
C4.5: programs for machine learning
C4.5: programs for machine learning
Data mining methods for knowledge discovery
Data mining methods for knowledge discovery
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
On the merits of building categorization systems by supervised clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
Partitioning-based clustering for Web document categorization
Decision Support Systems - Special issue on WITS '97
Swarm intelligence
Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A clustering strategy based on a formalism of the reproductive process in natural systems
SIGIR '79 Proceedings of the 2nd annual international ACM SIGIR conference on Information storage and retrieval: information implications into the eighties
A unified framework for model-based clustering
The Journal of Machine Learning Research
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Generative model-based document clustering: a comparative study
Knowledge and Information Systems
Semi-supervised model-based document clustering: A comparative study
Machine Learning
Web Semantics: Science, Services and Agents on the World Wide Web
AntClust: ant clustering and web usage mining
GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Harmony search for generalized orienteering problem: best touring in China
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
The variants of the harmony search algorithm: an overview
Artificial Intelligence Review
An Intelligent Tuned Harmony Search algorithm for optimisation
Information Sciences: an International Journal
An architecture for component-based design of representative-based clustering algorithms
Data & Knowledge Engineering
A fast and effective partitioning algorithm for document clustering
ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Efficient stochastic algorithms for document clustering
Information Sciences: an International Journal
A heuristic hierarchical clustering based on multiple similarity measurements
Pattern Recognition Letters
Hi-index | 0.00 |
Fast and high quality document clustering is a crucial task in organizing information, search engine results, enhancing web crawling, and information retrieval or filtering. Recent studies have shown that the most commonly used partition-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. In this paper we propose a novel Harmony K-means Algorithm (HKA) that deals with document clustering based on Harmony Search (HS) optimization method. It is proved by means of finite Markov chain theory that the HKA converges to the global optimum. To demonstrate the effectiveness and speed of HKA, we have applied HKA algorithms on some standard datasets. We also compare the HKA with other meta-heuristic and model-based document clustering approaches. Experimental results reveal that the HKA algorithm converges to the best known optimum faster than other methods and the quality of clusters are comparable.