Normalizing SMS: are two metaphors better than one?

Authors:
Catherine Kobus;François Yvon;Géraldine Damnati
Affiliations:
Orange Labs, Lannion;Univ Paris-Sud & LIMSI/CNRS, Orsay Cedex;Orange Labs, Lannion
Venue:
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Year:
2008

Citing 9
Cited 23

A statistical approach to machine translation

Computational Linguistics
A systematic comparison of various statistical alignment models

Computational Linguistics
Real-time automatic insertion of accents in French text

Natural Language Engineering
Inference of string mappings for language technology

Inference of string mappings for language technology
Pronunciation modeling for improved spelling correction

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Language and the Internet

Language and the Internet
A phrase-based statistical model for SMS text normalization

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions

A survey of types of text noise and techniques to handle noisy text

Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
An unsupervised model for text message normalization

CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
SMS based interface for FAQ retrieval

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Rewriting the orthography of sms messages

Natural Language Engineering
Subword variation in text message classification

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A hybrid rule/model-based finite-state framework for normalizing SMS messages

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Handling noisy queries in cross language FAQ retrieval

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Unsupervised cleansing of noisy text

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Insertion, deletion, or substitution?: normalizing text messages without pre-categorization nor supervision

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Contextual bearing on linguistic variation in social media

LSM '11 Proceedings of the Workshop on Languages in Social Media
Experiments with artificially generated noise for cleansing noisy text

Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
SMS normalization: combining phonetics, morphology and semantics

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Topics as contextual indicators for word choice in SMS conversations

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Unsupervised mining of lexical variants from noisy text

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Named entity recognition in tweets: an experimental study

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Short message communications: users, topics, and in-language processing

Proceedings of the 2nd ACM Symposium on Computing for Development
Review: SMS spam filtering: Methods and data

Expert Systems with Applications: An International Journal
Autonomous self-assessment of autocorrections: exploring text message dialogues

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
French presidential elections: what are the most efficient measures for tweets?

Proceedings of the first edition workshop on Politics, elections and data
A broad-coverage normalization system for social media language

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys (CSUR)
Normalization of informal text

Computer Speech and Language
Chinese-English mixed text normalization

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Electronic written texts used in computermediated interactions (e-mails, blogs, chats, etc) present major deviations from the norm of the language. This paper presents an comparative study of systems aiming at normalizing the orthography of French SMS messages: after discussing the linguistic peculiarities of these messages, and possible approaches to their automatic normalization, we present, evaluate and contrast two systems, one drawing inspiration from the Machine Translation task; the other using techniques that are commonly used in automatic speech recognition devices. Combining both approaches, our best normalization system achieves about 11% Word Error Rate on a test set of about 3000 unseen messages.