Normalizing SMS: are two metaphors better than one?

  • Authors:
  • Catherine Kobus;François Yvon;Géraldine Damnati

  • Affiliations:
  • Orange Labs, Lannion;Univ Paris-Sud & LIMSI/CNRS, Orsay Cedex;Orange Labs, Lannion

  • Venue:
  • COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Electronic written texts used in computermediated interactions (e-mails, blogs, chats, etc) present major deviations from the norm of the language. This paper presents an comparative study of systems aiming at normalizing the orthography of French SMS messages: after discussing the linguistic peculiarities of these messages, and possible approaches to their automatic normalization, we present, evaluate and contrast two systems, one drawing inspiration from the Machine Translation task; the other using techniques that are commonly used in automatic speech recognition devices. Combining both approaches, our best normalization system achieves about 11% Word Error Rate on a test set of about 3000 unseen messages.