Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance

  • Authors:
  • David Pinto;José-Miguel Benedí;Paolo Rosso

  • Affiliations:
  • Department of Information Systems and Computation, UPV, Valencia 46022, Camino de Vera s/n, Spain and Faculty of Computer Science, BUAP, Puebla 72570, Ciudad Universitaria, Mexico;Department of Information Systems and Computation, UPV, Valencia 46022, Camino de Vera s/n, Spain;Department of Information Systems and Computation, UPV, Valencia 46022, Camino de Vera s/n, Spain

  • Venue:
  • CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering short length texts is a difficult task itself, but adding the narrow domain characteristic poses an additional challenge for current clustering methods. We addressed this problem with the use of a new measure of distance between documents which is based on the symmetric Kullback-Leibler distance. Although this measure is commonly used to calculate a distance between two probability distributions, we have adapted it in order to obtain a distance value between two documents. We have carried out experiments over two different narrow-domain corpora and our findings indicates that it is possible to use this measure for the addressed problem obtaining comparable results than those which use the Jaccard similarity measure.