ITSA*: an effective iterative method for short-text clustering tasks

  • Authors:
  • Marcelo Errecalde;Diego Ingaramo;Paolo Rosso

  • Affiliations:
  • LIDIC, Universidad Nacional de San Luis, Argentina;LIDIC, Universidad Nacional de San Luis, Argentina;Natural Language Eng. Lab. ELiRF, DSIC, Universidad Politécnica de Valencia, Spain

  • Venue:
  • IEA/AIE'10 Proceedings of the 23rd international conference on Industrial engineering and other applications of applied intelligent systems - Volume Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The current tendency for people to use very short documents, e.g. blogs, text-messaging, news and others, has produced an increasing interest in automatic processing techniques which are able to deal with documents with these characteristics. In this context, "short-text clustering" is a very important research field where new clustering algorithms have been recently proposed to deal with this difficult problem. In this work, ITSA*, an iterative method based on the bio-inspired method PAntSA* is proposed for this task. ITSA* takes as input the results obtained by arbitrary clustering algorithms and refines them by iteratively using the PAntSA* algorithm. The proposal shows an interesting improvement in the results obtained with different algorithms on several short-text collections. However, ITSA* can not only be used as an effective improvement method. Using random initial clusterings, ITSA* outperforms well-known clustering algorithms in most of the experimental instances.