Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automatic text processing
Parallel algorithms for hierarchical clustering
Parallel Computing
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining methods for knowledge discovery
Data mining methods for knowledge discovery
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
Document Categorization and Query Generation on the World Wide WebUsing WebACE
Artificial Intelligence Review - Special issue on data mining on the Internet
Using star clusters for filtering
Proceedings of the ninth international conference on Information and knowledge management
Swarm intelligence
Information Retrieval
An evolutionary technique based on K-means algorithm for optimal clustering in RN
Information Sciences—Applications: An International Journal
Model Selection in Unsupervised Learning with Applications To Document Clustering
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An Ant Colony Optimization Heuristic for Solving Maximum Independent Set Problems
ICCIMA '03 Proceedings of the 5th International Conference on Computational Intelligence and Multimedia Applications
Model-based overlapping clustering
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Ant colony optimization of clustering models: Research Articles
International Journal of Intelligent Systems
Active semi-supervised fuzzy clustering for image database categorization
Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
A parallel hybrid web document clustering algorithm and its performance study
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Expert Systems with Applications: An International Journal
Harmony K-means algorithm for document clustering
Data Mining and Knowledge Discovery
Performance evaluation of density-based clustering methods
Information Sciences: an International Journal
Music-Inspired Harmony Search Algorithm: Theory and Applications
Music-Inspired Harmony Search Algorithm: Theory and Applications
Pairwise-adaptive dissimilarity measure for document clustering
Information Sciences: an International Journal
A time-efficient pattern reduction algorithm for k-means clustering
Information Sciences: an International Journal
A novel ant-based clustering algorithm using the kernel method
Information Sciences: an International Journal
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality
IEEE Transactions on Pattern Analysis and Machine Intelligence
Hybridization of the ant colony optimization with the k-means algorithm for clustering
SCIA'05 Proceedings of the 14th Scandinavian conference on Image Analysis
Harmony search for generalized orienteering problem: best touring in China
ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
An Evolutionary Approach to Multiobjective Clustering
IEEE Transactions on Evolutionary Computation
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
A harmony search algorithm for nurse rostering problems
Information Sciences: an International Journal
Clustering construction on a multimodal probability model
Information Sciences: an International Journal
On possibilistic clustering with repulsion constraints for imprecise data
Information Sciences: an International Journal
Fast global k-means clustering based on local geometrical information
Information Sciences: an International Journal
A novel efficient algorithm for mobile robot localization
Robotics and Autonomous Systems
Hybridising harmony search with a Markov blanket for gene selection problems
Information Sciences: an International Journal
Reliability assessment and failure analysis of lithium iron phosphate batteries
Information Sciences: an International Journal
CoBAn: A context based model for data leakage prevention
Information Sciences: an International Journal
Efficient and robust large medical image retrieval in mobile cloud computing environment
Information Sciences: an International Journal
An efficient Particle Swarm Optimization approach to cluster short texts
Information Sciences: an International Journal
Hi-index | 0.07 |
Clustering has become an increasingly important and highly complicated research area for targeting useful and relevant information in modern application domains such as the World Wide Web. Recent studies have shown that the most commonly used partitioning-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm may generate a local optimal clustering. In this paper, we present novel document clustering algorithms based on the Harmony Search (HS) optimization method. By modeling clustering as an optimization problem, we first propose a pure HS based clustering algorithm that finds near-optimal clusters within a reasonable time. Then, harmony clustering is integrated with the K-means algorithm in three ways to achieve better clustering by combining the explorative power of HS with the refining power of the K-means. Contrary to the localized searching property of K-means algorithm, the proposed algorithms perform a globalized search in the entire solution space. Additionally, the proposed algorithms improve K-means by making it less dependent on the initial parameters such as randomly chosen initial cluster centers, therefore, making it more stable. The behavior of the proposed algorithm is theoretically analyzed by modeling its population variance as a Markov chain. We also conduct an empirical study to determine the impacts of various parameters on the quality of clusters and convergence behavior of the algorithms. In the experiments, we apply the proposed algorithms along with K-means and a Genetic Algorithm (GA) based clustering algorithm on five different document datasets. Experimental results reveal that the proposed algorithms can find better clusters and the quality of clusters is comparable based on F-measure, Entropy, Purity, and Average Distance of Documents to the Cluster Centroid (ADDC).