A machine-translation method for normalization of SMS

  • Authors:
  • Darnes Vilariño;David Pinto;Beatriz Beltrán;Saul León;Esteban Castillo;Mireya Tovar

  • Affiliations:
  • Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, Mexico;Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, Mexico;Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, Mexico;Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, Mexico;Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, Mexico;Faculty of Computer Science, Benemérita Universidad Autónoma de Puebla, Mexico

  • Venue:
  • MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Normalization of SMS is a very important task that must be addressed by the computational community because of the tremendous growth of services based on mobile devices, which make use of this kind of messages. There exist many limitations on the automatic treatment of SMS texts derived from the particular writing style used. Even if there are suficient problems dealing with this kind of texts, we are also interested in some tasks requiring to understand the meaning of documents in different languages, therefore, increasing the complexity of such tasks. Our approach proposes to normalize SMS texts employing machine translation techniques. For this purpose, we use a statistical bilingual dictionary calculated on the basis of the IBM-4 model for determining the best translation for a given SMS term. We have compared the presented approach with a traditional probabilistic method of information retrieval, observing that the normalization model proposed here highly improves the performance of the probabilistic one.