Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Gene Clustering Using Self-Organizing Maps and Particle Swarm Optimization
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Introduction to Data Mining, (First Edition)
Introduction to Data Mining, (First Edition)
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Particle swarm optimization for multimodal functions: a clustering approach
Journal of Artificial Evolution and Applications - Particle Swarms: The Second Decade
A discrete particle swarm optimization algorithm for uncapacitated facility location problem
Journal of Artificial Evolution and Applications - Particle Swarms: The Second Decade
Proximity Estimation and Hardness of Short-Text Corpora
DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
On the relative hardness of clustering corpora
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Evaluation of internal validity measures in short-text corpora
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Clustering abstracts of scientific texts using the transition point technique
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
An approach to clustering abstracts
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
ITSA*: an effective iterative method for short-text clustering tasks
IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
A general bio-inspired method to improve the short-text clustering task
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Hi-index | 0.00 |
Clustering of short-text collections is a very relevant research area, given the current and future mode for people to use “small-language” (e.g. blogs, snippets, news and text-message generation such as email or chat). In recent years, a few approaches based on Particle Swarm Optimization (PSO) have been proposed to solve document clustering problems. However, the particularities that arise when this kind of approaches are used for clustering corpora containing very short documents have not received too much attention by the computational linguistic community, maybe due to the high challenge that this problem implies. In this work, we propose some variants of PSO methods to deal with this kind of corpora. Our proposal includes two very different approaches to the clustering problem, which essentially differ in the representations used for maintaining the information about the clusterings under consideration. In our approach, we used two unsupervised measures of cluster validity to be optimized: the Expected Density Measure $\bar{\rho}$ and the Global Silhouette coefficient. In recent works on short-text clustering, these measures have shown an interesting correlation level with the “true” categorizations provided by a human expert. The experimental results show that PSO-based approaches can be highly competitive alternatives for clustering short-text corpora and can, in some cases, outperform the performance of the most effective clustering algorithms used in this area.