A tutorial on hidden Markov models and selected applications in speech recognition
Readings in speech recognition
Contextual spelling correction using latent semantic analysis
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A knowledge-free method for capitalized word disambiguation
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Capitalizing machine translation
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Temporal Issues and Recognition Errors on the Capitalization of Speech Transcriptions
TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Language dynamics and capitalization using maximum entropy
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Scaling high-order character language models to gigabytes
Software '05 Proceedings of the Workshop on Software
Restoring Punctuation and Casing in English Text
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Rewriting the orthography of sms messages
Natural Language Engineering
A case study of using web search statistics: case restoration
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Question analysis: how watson reads a clue
IBM Journal of Research and Development
In the game: the interface between Watson and Jeopardy!
IBM Journal of Research and Development
Statistical machine translation enhancements through linguistic levels: A survey
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Truecasing is the process of restoring case information to badly-cased or non-cased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of ~98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content extraction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80.2%. This paper argues for the use of truecasing as a valuable component in text processing applications.