An unsupervised model for text message normalization

  • Authors:
  • Paul Cook;Suzanne Stevenson

  • Affiliations:
  • University of Toronto, Toronto, Canada;University of Toronto, Toronto, Canada

  • Venue:
  • CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cell phone text messaging users express themselves briefly and colloquially using a variety of creative forms. We analyze a sample of creative, non-standard text message word forms to determine frequent word formation processes in texting language. Drawing on these observations, we construct an unsupervised noisy-channel model for text message normalization. On a test set of 303 text message forms that differ from their standard form, our model achieves 59% accuracy, which is on par with the best supervised results reported on this dataset.