The mathematics of statistical machine translation: parameter estimation
Computational Linguistics - Special issue on using large corpora: II
Bitext maps and alignment via pattern recognition
Computational Linguistics
Decoding complexity in word-replacement translation models
Computational Linguistics
A DP based search using monotone alignments in statistical translation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Automatic construction of machine translation knowledge using translation literalness
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
A phrase-based statistical model for SMS text normalization
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Language Models for Handwritten Short Message Services
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
Investigation and modeling of the structure of texting language
International Journal on Document Analysis and Recognition
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
Normalizing SMS: are two metaphors better than one?
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Design of the moses decoder for statistical machine translation
SETQA-NLP '08 Software Engineering, Testing, and Quality Assurance for Natural Language Processing
Fast sequential decoding algorithm using a stack
IBM Journal of Research and Development
Language independent unsupervised learning of short message service dialect
International Journal on Document Analysis and Recognition - Special Issue NOISY
SMS based interface for FAQ retrieval
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Automatic filtering of bilingual corpora for statistical machine translation
NLDB'05 Proceedings of the 10th international conference on Natural Language Processing and Information Systems
Handling noisy queries in cross language FAQ retrieval
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Contextual bearing on linguistic variation in social media
LSM '11 Proceedings of the Workshop on Languages in Social Media
Experiments with artificially generated noise for cleansing noisy text
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data
Unsupervised mining of lexical variants from noisy text
EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Automatically constructing a normalisation dictionary for microblogs
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Lexical normalization for social media text
ACM Transactions on Intelligent Systems and Technology (TIST) - Special section on twitter and microblogging services, social recommender systems, and CAMRa2010: Movie recommendation in context
Evaluating bad query abandonment in an iterative SMS-based FAQ retrieval system
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Normalization of informal text
Computer Speech and Language
A statistical approach to mining customers' conversational data from social media
IBM Journal of Research and Development
Hi-index | 0.00 |
In this paper we look at the problem of cleansing noisy text using a statistical machine translation model. Noisy text is produced in informal communications such as Short Message Service (SMS), Twitter and chat. A typical Statistical Machine Translation system is trained on parallel text comprising noisy and clean sentences. In this paper we propose an unsupervised method for the translation of noisy text to clean text. Our method has two steps. For a given noisy sentence, a weighted list of possible clean tokens for each noisy token are obtained. The clean sentence is then obtained by maximizing the product of the weighted lists and the language model scores.