Efficient stochastic algorithms for document clustering

Authors:
Rana Forsati;Mehrdad Mahdavi;Mehrnoush Shamsfard;Mohammad Reza Meybodi
Affiliations:
Faculty of Electrical and Computer Engineering, Shahid Beheshti University, G. C., Tehran, Iran;Department of Computer Engineering, Sharif University of Technology, Tehran, Iran;Faculty of Electrical and Computer Engineering, Shahid Beheshti University, G. C., Tehran, Iran;Department of Computer Engineering and IT, Amirkabir University of Technology, Tehran, Iran and Institute for Studies in Theoretical Physics and Mathematics (IPM), School of Computer Science, Tehr ...
Venue:
Information Sciences: an International Journal
Year:
2013

Citing 34
Cited 10

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Automatic text processing

Automatic text processing
Parallel algorithms for hierarchical clustering

Parallel Computing
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining methods for knowledge discovery

Data mining methods for knowledge discovery
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Document Categorization and Query Generation on the World Wide WebUsing WebACE

Artificial Intelligence Review - Special issue on data mining on the Internet
Using star clusters for filtering

Proceedings of the ninth international conference on Information and knowledge management
Swarm intelligence

Swarm intelligence
Information Retrieval

Information Retrieval
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
An evolutionary technique based on K-means algorithm for optimal clustering in RN

Information Sciences—Applications: An International Journal
Model Selection in Unsupervised Learning with Applications To Document Clustering

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An Ant Colony Optimization Heuristic for Solving Maximum Independent Set Problems

ICCIMA '03 Proceedings of the 5th International Conference on Computational Intelligence and Multimedia Applications
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Model-based overlapping clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Ant colony optimization of clustering models: Research Articles

International Journal of Intelligent Systems
Active semi-supervised fuzzy clustering for image database categorization

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
A parallel hybrid web document clustering algorithm and its performance study

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Genetic algorithm for text clustering using ontology and evaluating the validity of various semantic similarity measures

Expert Systems with Applications: An International Journal
Harmony K-means algorithm for document clustering

Data Mining and Knowledge Discovery
Performance evaluation of density-based clustering methods

Information Sciences: an International Journal
Music-Inspired Harmony Search Algorithm: Theory and Applications

Music-Inspired Harmony Search Algorithm: Theory and Applications
Pairwise-adaptive dissimilarity measure for document clustering

Information Sciences: an International Journal
A time-efficient pattern reduction algorithm for k-means clustering

Information Sciences: an International Journal
A novel ant-based clustering algorithm using the kernel method

Information Sciences: an International Journal
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality

IEEE Transactions on Pattern Analysis and Machine Intelligence
Hybridization of the ant colony optimization with the k-means algorithm for clustering

SCIA'05 Proceedings of the 14th Scandinavian conference on Image Analysis
Harmony search for generalized orienteering problem: best touring in China

ICNC'05 Proceedings of the First international conference on Advances in Natural Computation - Volume Part III
An Evolutionary Approach to Multiobjective Clustering

IEEE Transactions on Evolutionary Computation
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Exploratory Power of the Harmony Search Algorithm: Analysis and Improvements for Global Numerical Optimization

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

A harmony search algorithm for nurse rostering problems

Information Sciences: an International Journal
Clustering construction on a multimodal probability model

Information Sciences: an International Journal
On possibilistic clustering with repulsion constraints for imprecise data

Information Sciences: an International Journal
Fast global k-means clustering based on local geometrical information

Information Sciences: an International Journal
A novel efficient algorithm for mobile robot localization

Robotics and Autonomous Systems
Hybridising harmony search with a Markov blanket for gene selection problems

Information Sciences: an International Journal
Reliability assessment and failure analysis of lithium iron phosphate batteries

Information Sciences: an International Journal
CoBAn: A context based model for data leakage prevention

Information Sciences: an International Journal
Efficient and robust large medical image retrieval in mobile cloud computing environment

Information Sciences: an International Journal
An efficient Particle Swarm Optimization approach to cluster short texts

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Clustering has become an increasingly important and highly complicated research area for targeting useful and relevant information in modern application domains such as the World Wide Web. Recent studies have shown that the most commonly used partitioning-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm may generate a local optimal clustering. In this paper, we present novel document clustering algorithms based on the Harmony Search (HS) optimization method. By modeling clustering as an optimization problem, we first propose a pure HS based clustering algorithm that finds near-optimal clusters within a reasonable time. Then, harmony clustering is integrated with the K-means algorithm in three ways to achieve better clustering by combining the explorative power of HS with the refining power of the K-means. Contrary to the localized searching property of K-means algorithm, the proposed algorithms perform a globalized search in the entire solution space. Additionally, the proposed algorithms improve K-means by making it less dependent on the initial parameters such as randomly chosen initial cluster centers, therefore, making it more stable. The behavior of the proposed algorithm is theoretically analyzed by modeling its population variance as a Markov chain. We also conduct an empirical study to determine the impacts of various parameters on the quality of clusters and convergence behavior of the algorithms. In the experiments, we apply the proposed algorithms along with K-means and a Genetic Algorithm (GA) based clustering algorithm on five different document datasets. Experimental results reveal that the proposed algorithms can find better clusters and the quality of clusters is comparable based on F-measure, Entropy, Purity, and Average Distance of Documents to the Cluster Centroid (ADDC).