Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Bootstrapping bilingual data using consensus translation for a multilingual instant messaging system
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Pronunciation modeling for improved spelling correction
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A phrase-based statistical model for SMS text normalization
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Investigation and modeling of the structure of texting language
International Journal on Document Analysis and Recognition
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Normalizing SMS: are two metaphors better than one?
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Automatic Chinese abbreviation generation using conditional random field
NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
An unsupervised model for text message normalization
CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
Using the web for language independent spellchecking and autocorrection
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
A hybrid rule/model-based finite-state framework for normalizing SMS messages
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
WSA '10 Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media
Unsupervised cleansing of noisy text
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Automatically constructing a normalisation dictionary for microblogs
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
This paper describes a noisy-channel approach for the normalization of informal text, such as that found in emails, chat rooms, and SMS messages. In particular, we introduce two character-level methods for the abbreviation modeling aspect of the noisy channel model: a statistical classifier using language-based features to decide whether a character is likely to be removed from a word, and a character-level machine translation model. A two-phase approach is used; in the first stage the possible candidates are generated using the selected abbreviation model and in the second stage we choose the best candidate by decoding using a language model. Overall we find that this approach works well and is on par with current research in the field.