Performance evaluation of density-based clustering methods

Authors:
Ramiz M. Aliguliyev
Affiliations:
Institute of Information Technology of Azerbaijan National Academy of Sciences, Department of Artificial Intelligence and Computer Sciences, 9, F. Agayev Street, Baku AZ1141, Azerbaijan
Venue:
Information Sciences: an International Journal
Year:
2009

Citing 32
Cited 22

WebACE: a Web agent for document categorization and exploration

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Data clustering: a review

ACM Computing Surveys (CSUR)
Topic Detection and Tracking: Event-Based Information Organization

Topic Detection and Tracking: Event-Based Information Organization
Modern Information Retrieval

Modern Information Retrieval
Techniques of Cluster Algorithms in Data Mining

Data Mining and Knowledge Discovery
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
A new cluster validity measure and its application to image compression

Pattern Analysis & Applications
Efficient Phrase-Based Document Indexing for Web Document Clustering

IEEE Transactions on Knowledge and Data Engineering
Cluster Validity Indices for Graph Partitioning

IV '04 Proceedings of the Information Visualisation, Eighth International Conference
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series)

Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series)
Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
A Novel Partitioning-Based Clustering Method and Generic Document Summarization

WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Regularized clustering for documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Density-based clustering for real-time stream data

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
QCS: A system for querying, clustering and summarizing documents

Information Processing and Management: an International Journal
A novel clustering approach and prediction of optimal number of clusters: global optimum search with enhanced positioning

Journal of Global Optimization
A genetic algorithm that exchanges neighboring centers for k-means clustering

Pattern Recognition Letters
Text document clustering based on frequent word meaning sequences

Data & Knowledge Engineering
A cluster validity index for fuzzy clustering

Information Sciences: an International Journal
Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm

Pattern Recognition Letters
Modified global k-means algorithm for minimum sum-of-squares clustering problems

Pattern Recognition
Text Clustering with Feature Selection by Using Statistical Data

IEEE Transactions on Knowledge and Data Engineering
GAKREM: A novel hybrid clustering algorithm

Information Sciences: an International Journal
Clustering high dimensional data: A graph-based relaxed optimization approach

Information Sciences: an International Journal
A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters

IEEE Transactions on Knowledge and Data Engineering
Web People Search via Connection Analysis

IEEE Transactions on Knowledge and Data Engineering
Automatic image pixel clustering with an improved differential evolution

Applied Soft Computing
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Automatic Clustering Using an Improved Differential Evolution Algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Pairwise-adaptive dissimilarity measure for document clustering

Information Sciences: an International Journal
Validation of overlapping clustering: A random clustering perspective

Information Sciences: an International Journal
A fuzzy logic control using a differential evolution algorithm aimed at modelling the financial market dynamics

Information Sciences: an International Journal
A time-efficient pattern reduction algorithm for k-means clustering

Information Sciences: an International Journal
An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list

Information Sciences: an International Journal
Learning latent variable models from distributed and abstracted data

Information Sciences: an International Journal
Improving reputation systems for wireless sensor networks using genetic algorithms

Proceedings of the 13th annual conference on Genetic and evolutionary computation
A novel ant-based clustering algorithm using the kernel method

Information Sciences: an International Journal
A clustering algorithm for multiple data streams based on spectral component similarity

Information Sciences: an International Journal
A probabilistic framework for estimating the accuracy of aggregate range queries evaluated over histograms

Information Sciences: an International Journal
MCMR: Maximum coverage and minimum redundant text summarization model

Expert Systems with Applications: An International Journal
GenDocSum+MCLR: Generic document summarization based on maximum coverage and less redundancy

Expert Systems with Applications: An International Journal
DESAMC+DocSum: Differential evolution with self-adaptive mutation and crossover parameters for multi-document summarization

Knowledge-Based Systems
Efficient stochastic algorithms for document clustering

Information Sciences: an International Journal
Clustering via geometric median shift over Riemannian manifolds

Information Sciences: an International Journal
Bio-inspired enhancement of reputation systems for intelligent environments

Information Sciences: an International Journal
Measuring the coverage and redundancy of information search services on e-commerce platforms

Electronic Commerce Research and Applications
Ranked k-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets

Knowledge-Based Systems
An efficient approach for unsupervised fuzzy clustering based on grouping evolution strategies

Pattern Recognition
A novel ant-based clustering algorithm using Renyi entropy

Applied Soft Computing
Rough clustering using generalized fuzzy clustering algorithm

Pattern Recognition
Extractive single-document summarization based on genetic operators and guided local search

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.08

Visualization

Abstract

With the development of the World Wide Web, document clustering is receiving more and more attention as an important and fundamental technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. A good document clustering approach can assist computers in organizing the document corpus automatically into a meaningful cluster hierarchy for efficient browsing and navigation, which is very valuable for complementing the deficiencies of traditional information retrieval technologies. In this paper, we study the performance of different density-based criterion functions, which can be classified as internal, external or hybrid, in the context of partitional clustering of document datasets. In our study, a weight was assigned to each document, which defined its relative position in the entire collection. To show the efficiency of the proposed approach, the weighted methods were compared to their unweighted variants. To verify the robustness of the proposed approach, experiments were conducted on datasets with a wide variety of numbers of clusters, documents and terms. To evaluate the criterion functions, we used the WebKb, Reuters-21578, 20Newsgroups-18828, WebACE and TREC-5 datasets, as they are currently the most widely used benchmarks in document clustering research. To evaluate the quality of a clustering solution, a wide spectrum of indices, three internal validity indices and seven external validity indices, were used. The internal validity indices were used for evaluating the within-cluster scatter and between cluster separations. The external validity indices were used for comparing the clustering solutions produced by the proposed criterion functions with the ''ground truth'' results. Experiments showed that our approach significantly improves clustering quality. In this paper, we developed a modified differential evolution (DE) algorithm to optimize the criterion functions. This modification accelerates the convergence of DE and, unlike the basic DE algorithm, guarantees that the received solution will be feasible.