A Case Restoration Approach to Named Entity Tagging in Degraded Documents

  • Authors:
  • Rohini K. Srihari;Cheng Niu;Wei Li;Jihong Ding

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 2
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a novel approach to namedentity (NE) tagging on degraded documents. NE taggingis the process of identifying salient text strings inunstructured text, corresponding to names of people,places, organizations, times/dates, etc. Although NEtagging is typically part of a larger informationextraction process, it has other applications, such asimproving search in an information retrieval system, andpost-processing the results of an OCR system. We focuson degraded documents, i.e. case insensitive documentsthat lack orthographic information. Examples includeoutput of speech recognition systems, as well as e-mail.The traditional approach involves retraining an NEtagger on degraded text, a cumbersome operation. Thispaper describes an approach whereby text is first"restored" to its implicit case sensitive form, andsubsequently processed by the original NE tagger.Results show that this new approach leads to far lessprecision loss in NE tagging of degraded documents.