WebACE: a Web agent for document categorization and exploration
AGENTS '98 Proceedings of the second international conference on Autonomous agents
ACM Computing Surveys (CSUR)
Topic Detection and Tracking: Event-Based Information Organization
Topic Detection and Tracking: Event-Based Information Organization
Modern Information Retrieval
Techniques of Cluster Algorithms in Data Mining
Data Mining and Knowledge Discovery
On Clustering Validation Techniques
Journal of Intelligent Information Systems
A new cluster validity measure and its application to image compression
Pattern Analysis & Applications
Efficient Phrase-Based Document Indexing for Web Document Clustering
IEEE Transactions on Knowledge and Data Engineering
Cluster Validity Indices for Graph Partitioning
IV '04 Proceedings of the Information Visualisation, Eighth International Conference
Automated Variable Weighting in k-Means Type Clustering
IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series)
Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series)
Comparing Subspace Clusterings
IEEE Transactions on Knowledge and Data Engineering
A Novel Partitioning-Based Clustering Method and Generic Document Summarization
WI-IATW '06 Proceedings of the 2006 IEEE/WIC/ACM international conference on Web Intelligence and Intelligent Agent Technology
Regularized clustering for documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Density-based clustering for real-time stream data
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
QCS: A system for querying, clustering and summarizing documents
Information Processing and Management: an International Journal
Journal of Global Optimization
A genetic algorithm that exchanges neighboring centers for k-means clustering
Pattern Recognition Letters
Text document clustering based on frequent word meaning sequences
Data & Knowledge Engineering
A cluster validity index for fuzzy clustering
Information Sciences: an International Journal
Automatic kernel clustering with a Multi-Elitist Particle Swarm Optimization Algorithm
Pattern Recognition Letters
Modified global k-means algorithm for minimum sum-of-squares clustering problems
Pattern Recognition
Text Clustering with Feature Selection by Using Statistical Data
IEEE Transactions on Knowledge and Data Engineering
GAKREM: A novel hybrid clustering algorithm
Information Sciences: an International Journal
Clustering high dimensional data: A graph-based relaxed optimization approach
Information Sciences: an International Journal
A Point Symmetry-Based Clustering Technique for Automatic Evolution of Clusters
IEEE Transactions on Knowledge and Data Engineering
Web People Search via Connection Analysis
IEEE Transactions on Knowledge and Data Engineering
Automatic image pixel clustering with an improved differential evolution
Applied Soft Computing
Some new indexes of cluster validity
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Automatic Clustering Using an Improved Differential Evolution Algorithm
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Pairwise-adaptive dissimilarity measure for document clustering
Information Sciences: an International Journal
Validation of overlapping clustering: A random clustering perspective
Information Sciences: an International Journal
Information Sciences: an International Journal
A time-efficient pattern reduction algorithm for k-means clustering
Information Sciences: an International Journal
An agglomerative clustering algorithm using a dynamic k-nearest-neighbor list
Information Sciences: an International Journal
Learning latent variable models from distributed and abstracted data
Information Sciences: an International Journal
Improving reputation systems for wireless sensor networks using genetic algorithms
Proceedings of the 13th annual conference on Genetic and evolutionary computation
A novel ant-based clustering algorithm using the kernel method
Information Sciences: an International Journal
A clustering algorithm for multiple data streams based on spectral component similarity
Information Sciences: an International Journal
Information Sciences: an International Journal
MCMR: Maximum coverage and minimum redundant text summarization model
Expert Systems with Applications: An International Journal
GenDocSum+MCLR: Generic document summarization based on maximum coverage and less redundancy
Expert Systems with Applications: An International Journal
Efficient stochastic algorithms for document clustering
Information Sciences: an International Journal
Clustering via geometric median shift over Riemannian manifolds
Information Sciences: an International Journal
Bio-inspired enhancement of reputation systems for intelligent environments
Information Sciences: an International Journal
Measuring the coverage and redundancy of information search services on e-commerce platforms
Electronic Commerce Research and Applications
A novel ant-based clustering algorithm using Renyi entropy
Applied Soft Computing
Rough clustering using generalized fuzzy clustering algorithm
Pattern Recognition
Extractive single-document summarization based on genetic operators and guided local search
Expert Systems with Applications: An International Journal
Hi-index | 0.08 |
With the development of the World Wide Web, document clustering is receiving more and more attention as an important and fundamental technique for unsupervised document organization, automatic topic extraction, and fast information retrieval or filtering. A good document clustering approach can assist computers in organizing the document corpus automatically into a meaningful cluster hierarchy for efficient browsing and navigation, which is very valuable for complementing the deficiencies of traditional information retrieval technologies. In this paper, we study the performance of different density-based criterion functions, which can be classified as internal, external or hybrid, in the context of partitional clustering of document datasets. In our study, a weight was assigned to each document, which defined its relative position in the entire collection. To show the efficiency of the proposed approach, the weighted methods were compared to their unweighted variants. To verify the robustness of the proposed approach, experiments were conducted on datasets with a wide variety of numbers of clusters, documents and terms. To evaluate the criterion functions, we used the WebKb, Reuters-21578, 20Newsgroups-18828, WebACE and TREC-5 datasets, as they are currently the most widely used benchmarks in document clustering research. To evaluate the quality of a clustering solution, a wide spectrum of indices, three internal validity indices and seven external validity indices, were used. The internal validity indices were used for evaluating the within-cluster scatter and between cluster separations. The external validity indices were used for comparing the clustering solutions produced by the proposed criterion functions with the ''ground truth'' results. Experiments showed that our approach significantly improves clustering quality. In this paper, we developed a modified differential evolution (DE) algorithm to optimize the criterion functions. This modification accelerates the convergence of DE and, unlike the basic DE algorithm, guarantees that the received solution will be feasible.