Relevance of contextual information in compression-based text clustering

  • Authors:
  • Ana Granados;Rafael Martínez;David Camacho;Francisco de Borja Rodríguez

  • Affiliations:
  • Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain;Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain;Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain;Escuela Politécnica Superior, Universidad Autónoma de Madrid, Spain

  • Venue:
  • IDEAL'10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we take a step towards understanding compression distances by analyzing the relevance of contextual information in compression-based text clustering. In order to do so, two kinds of word removal are explored, one that maintains part of the contextual information despite the removal, and one that does not maintain it. We show how removing words in such a way that the contextual information is maintained despite the word removal helps the compression-based text clustering and improves its accuracy, while on the contrary, removing words losing that contextual information makes the clustering results worse.