tRuEcasIng

  • Authors:
  • Lucian Vlad Lita;Abe Ittycheriah;Salim Roukos;Nanda Kambhatla

  • Affiliations:
  • Carnegie Mellon;IBM T.J. Watson;IBM T.J. Watson;IBM T.J. Watson

  • Venue:
  • ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Truecasing is the process of restoring case information to badly-cased or non-cased text. This paper explores truecasing issues and proposes a statistical, language modeling based truecaser which achieves an accuracy of ~98% on news articles. Task based evaluation shows a 26% F-measure improvement in named entity recognition when using truecasing. In the context of automatic content extraction, mention detection on automatic speech recognition text is also improved by a factor of 8. Truecasing also enhances machine translation output legibility and yields a BLEU score improvement of 80.2%. This paper argues for the use of truecasing as a valuable component in text processing applications.