A hybrid rule/model-based finite-state framework for normalizing SMS messages

  • Authors:
  • Richard Beaufort;Sophie Roekhaut;Louise-Amélie Cougnon;Cédrick Fairon

  • Affiliations:
  • Université catholique de Louvain, Louvain-la-Neuve, Belgium;Université de Mons, Mons, Belgium;Université catholique de Louvain, Louvain-la-Neuve, Belgium;Université catholique de Louvain, Louvain-la-Neuve, Belgium

  • Venue:
  • ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years, research in natural language processing has increasingly focused on normalizing SMS messages. Different well-defined approaches have been proposed, but the problem remains far from being solved: best systems achieve a 11% Word Error Rate. This paper presents a method that shares similarities with both spell checking and machine translation approaches. The normalization part of the system is entirely based on models trained from a corpus. Evaluated in French by 10-fold-cross validation, the system achieves a 9.3% Word Error Rate and a 0.83 BLEU score.