Text classification using string kernels
The Journal of Machine Learning Research
Discovery of inference rules for question-answering
Natural Language Engineering
Bootstrapping bilingual data using consensus translation for a multilingual instant messaging system
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
A phrase-based statistical model for SMS text normalization
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Investigation and modeling of the structure of texting language
International Journal on Document Analysis and Recognition
Normalizing SMS: are two metaphors better than one?
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
An unsupervised model for text message normalization
CALC '09 Proceedings of the Workshop on Computational Approaches to Linguistic Creativity
Rewriting the orthography of sms messages
Natural Language Engineering
Transliteration generation and mining with limited training resources
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Unsupervised cleansing of noisy text
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Lexical normalisation of short text messages: makn sens a #twitter
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Contextual bearing on linguistic variation in social media
LSM '11 Proceedings of the Workshop on Languages in Social Media
Aligning needles in a haystack: paraphrase acquisition across the web
IJCNLP'05 Proceedings of the Second international joint conference on Natural Language Processing
Automatically constructing a normalisation dictionary for microblogs
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Lexical normalization for social media text
ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
Hi-index | 0.00 |
The amount of data produced in user-generated content continues to grow at a staggering rate. However, the text found in these media can deviate wildly from the standard rules of orthography, syntax and even semantics and present significant problems to downstream applications which make use of this noisy data. In this paper we present a novel unsupervised method for extracting domain-specific lexical variants given a large volume of text. We demonstrate the utility of this method by applying it to normalize text messages found in the online social media service, Twitter, into their most likely standard English versions. Our method yields a 20% reduction in word error rate over an existing state-of-the-art approach.