A fast and effective partitioning algorithm for document clustering

Authors:
Rajeev Kumar;Alok Ranjan;Joydip Dhar
Affiliations:
Department of Information Technology, ABV - Indian Institute of Information Technology and Management, Gwalior, India;Department of Information Technology, ABV - Indian Institute of Information Technology and Management, Gwalior, India;Department of Applied Sciences, ABV - Indian Institute of Information Technology and Management, Gwalior, India
Venue:
ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Year:
2010

Citing 15
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Bipartite graph partitioning and data clustering

Proceedings of the tenth international conference on Information and knowledge management
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning

Co-clustering Documents and Words Using Bipartite Spectral GraphPartitioning
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers
Harmony K-means algorithm for document clustering

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fast and high quality document clustering is one of the most important tasks in the modern era of information. With the huge amount of available data and with an aim to creating better quality clusters, scores of algorithms having quality-complexity trade-offs have been proposed. Some of the proposed algorithms attempt to minimize the computational overload in terms of certain criterion functions defined for the whole set of clustering solution. In this paper, we have proposed a novel algorithm for document clustering using a graph based criterion function. Our algorithm is partitioning in nature. Most of the commonly used partitioning clustering algorithms are inflicted with the drawback of trapping into local optimum solutions. However, the algorithm proposed in this paper usually leads to the global optimum solution. Its performance enhances with the increment in the number of clusters. We have carried out sophisticated experiments wherein we have compared our algorithm with two well known document clustering algorithms viz. k-means and k-means++ algorithm. The results so obtained confirm the superiority of our algorithm.