An efficient Particle Swarm Optimization approach to cluster short texts

Authors:
Leticia Cagnina;Marcelo Errecalde;Diego Ingaramo;Paolo Rosso
Affiliations:
-;-;-;-
Venue:
Information Sciences: an International Journal
Year:
2014

Citing 28
Cited 0

Algorithms for clustering data

Algorithms for clustering data
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Gene Clustering Using Self-Organizing Maps and Particle Swarm Optimization

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Ontologies Improve Text Document Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Clustering short texts using wikipedia

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering: an optimization problem

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Proximity Estimation and Hardness of Short-Text Corpora

DEXA '08 Proceedings of the 2008 19th International Conference on Database and Expert Systems Application
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Improving similarity measures for short segments of text

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Exploiting internal and external semantics for the clustering of short texts using world knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
Differential evolution and particle swarm optimisation in partitional clustering

Computational Statistics & Data Analysis
Learning term-weighting functions for similarity measures

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
AntClust: ant clustering and web usage mining

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI
On the relative hardness of clustering corpora

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Evaluation of internal validity measures in short-text corpora

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
ITSA*: an effective iterative method for short-text clustering tasks

IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
An approach to clustering abstracts

NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
A general bio-inspired method to improve the short-text clustering task

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Clustering short text and its evaluation

CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Efficient stochastic algorithms for document clustering

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

Short texts such as evaluations of commercial products, news, FAQ's and scientific abstracts are important resources on the Web due to the constant requirements of people to use this on line information in real life. In this context, the clustering of short texts is a significant analysis task and a discrete Particle Swarm Optimization (PSO) algorithm named CLUDIPSO has recently shown a promising performance in this type of problems. CLUDIPSO obtained high quality results with small corpora although, with larger corpora, a significant deterioration of performance was observed. This article presents CLUDIPSO^*, an improved version of CLUDIPSO, which includes a different representation of particles, a more efficient evaluation of the function to be optimized and some modifications in the mutation operator. Experimental results with corpora containing scientific abstracts, news and short legal documents obtained from the Web, show that CLUDIPSO^* is an effective clustering method for short-text corpora of small and medium size.