tRuEcasIng

Authors:
Lucian Vlad Lita;Abe Ittycheriah;Salim Roukos;Nanda Kambhatla
Affiliations:
Carnegie Mellon;IBM T.J. Watson;IBM T.J. Watson;IBM T.J. Watson
Venue:
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Year:
2003

Citing 6
Cited 14

A tutorial on hidden Markov models and selected applications in speech recognition

Readings in speech recognition
Contextual spelling correction using latent semantic analysis

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A knowledge-free method for capitalized word disambiguation

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
An improved error model for noisy channel spelling correction

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics

Email data cleaning

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Capitalizing machine translation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Integrated scoring for spelling error correction, abbreviation expansion and case restoration in dirty text

AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Temporal Issues and Recognition Errors on the Capitalization of Speech Transcriptions

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Recovering capitalization and punctuation marks for automatic speech recognition: Case study for Portuguese broadcast news

Speech Communication
Language dynamics and capitalization using maximum entropy

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Scaling high-order character language models to gigabytes

Software '05 Proceedings of the Workshop on Software
Restoring Punctuation and Casing in English Text

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Rewriting the orthography of sms messages

Natural Language Engineering
A case study of using web search statistics: case restoration

CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Question analysis: how watson reads a clue

IBM Journal of Research and Development
In the game: the interface between Watson and Jeopardy!

IBM Journal of Research and Development
Statistical machine translation enhancements through linguistic levels: A survey

ACM Computing Surveys (CSUR)
SMT-based ASR domain adaptation methods for under-resourced languages: Application to Romanian

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Truecasing is the process of restoring case information to badly-cased or non-cased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of ~98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content extraction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80.2%. This paper argues for the use of truecasing as a valuable component in text processing applications.