Harmony K-means algorithm for document clustering

Authors:
Mehrdad Mahdavi;Hassan Abolhassani
Affiliations:
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran;Department of Computer Engineering, Sharif University of Technology, Tehran, Iran and School of Computer Science, Institute for Studies in Theoretical Physics and Mathematics (IPM), Tehran, Iran
Venue:
Data Mining and Knowledge Discovery
Year:
2009

Citing 20
Cited 6

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
C4.5: programs for machine learning

C4.5: programs for machine learning
Data mining methods for knowledge discovery

Data mining methods for knowledge discovery
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
On the merits of building categorization systems by supervised clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Partitioning-based clustering for Web document categorization

Decision Support Systems - Special issue on WITS '97
Swarm intelligence

Swarm intelligence
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A clustering strategy based on a formalism of the reproductive process in natural systems

SIGIR '79 Proceedings of the 2nd annual international ACM SIGIR conference on Information storage and retrieval: information implications into the eighties
A unified framework for model-based clustering

The Journal of Machine Learning Research
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Generative model-based document clustering: a comparative study

Knowledge and Information Systems
Semi-supervised model-based document clustering: A comparative study

Machine Learning
Semantic Web Mining

Web Semantics: Science, Services and Agents on the World Wide Web
AntClust: ant clustering and web usage mining

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
Harmony search for generalized orienteering problem: best touring in China

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III

The variants of the harmony search algorithm: an overview

Artificial Intelligence Review
An Intelligent Tuned Harmony Search algorithm for optimisation

Information Sciences: an International Journal
An architecture for component-based design of representative-based clustering algorithms

Data & Knowledge Engineering
A fast and effective partitioning algorithm for document clustering

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Efficient stochastic algorithms for document clustering

Information Sciences: an International Journal
A heuristic hierarchical clustering based on multiple similarity measurements

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fast and high quality document clustering is a crucial task in organizing information, search engine results, enhancing web crawling, and information retrieval or filtering. Recent studies have shown that the most commonly used partition-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal solution. In this paper we propose a novel Harmony K-means Algorithm (HKA) that deals with document clustering based on Harmony Search (HS) optimization method. It is proved by means of finite Markov chain theory that the HKA converges to the global optimum. To demonstrate the effectiveness and speed of HKA, we have applied HKA algorithms on some standard datasets. We also compare the HKA with other meta-heuristic and model-based document clustering approaches. Experimental results reveal that the HKA algorithm converges to the best known optimum faster than other methods and the quality of clusters are comparable.