Analysis and study on text representation to improve the accuracy of the normalized compression distance

  • Authors:
  • Ana Granados

  • Affiliations:
  • Department of Computer Science, Escuela Politécnica Superior, Universidad Autónoma de Madrid, Madrid, Spain. E-mail: ana.granadosf@uam.es

  • Venue:
  • AI Communications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This thesis takes a small step towards better understanding both the nature of texts and the nature of compression distances. Broadly speaking, the way in which this is done is exploring the effects that several distortion techniques have on one of the most successful distances in the family of compression distances, the Normalized Compression Distance NCD. The experimental results show that changing the representation of texts applying one of the explored distortion techniques can be beneficial both in NCD-based document clustering and in NCD-based document search.