A design principles of a weighted finite-state transducer library
Theoretical Computer Science - Special issue on implementing automata
Finite-State Language Processing
Finite-State Language Processing
Generic epsilon -Removal Algorithm for Weighted Automata
CIAA '00 Revised Papers from the 5th International Conference on Implementation and Application of Automata
An efficient compiler for weighted rewrite rules
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Bootstrapping bilingual data using consensus translation for a multilingual instant messaging system
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Pronunciation modeling for improved spelling correction
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A phrase-based statistical model for SMS text normalization
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Investigation and modeling of the structure of texting language
International Journal on Document Analysis and Recognition
Normalizing SMS: are two metaphors better than one?
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
An unsupervised model for text message normalization
CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Lexical normalisation of short text messages: makn sens a #twitter
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Subword and spatiotemporal models for identifying actionable information in Haitian Kreyol
CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Contextual bearing on linguistic variation in social media
LSM '11 Proceedings of the Workshop on Languages in Social Media
Short message communications: users, topics, and in-language processing
Proceedings of the 2nd ACM Symposium on Computing for Development
Review: SMS spam filtering: Methods and data
Expert Systems with Applications: An International Journal
Processing informal, romanized Pakistani text messages
LSM '12 Proceedings of the Second Workshop on Language in Social Media
A broad-coverage normalization system for social media language
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Building a lightweight semantic model for unsupervised information extraction on short listings
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Approaches of anonymisation of an SMS corpus
CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Normalization of informal text
Computer Speech and Language
Chinese-English mixed text normalization
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
In recent years, research in natural language processing has increasingly focused on normalizing SMS messages. Different well-defined approaches have been proposed, but the problem remains far from being solved: best systems achieve a 11% Word Error Rate. This paper presents a method that shares similarities with both spell checking and machine translation approaches. The normalization part of the system is entirely based on models trained from a corpus. Evaluated in French by 10-fold-cross validation, the system achieves a 9.3% Word Error Rate and a 0.83 BLEU score.