Robust named entity detection using an Arabic offline handwriting recognition system

  • Authors:
  • Krishna Subramanian;Rohit Prasad;Prem Natarajan

  • Affiliations:
  • BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA

  • Venue:
  • Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text from Arabic optical handwriting recognition (OHR) systems can provide key indexing information. In particular, the text is rich in named entities (NEs) and detection of such entities is critical for search applications. Traditional approaches for detecting NEs in optical character recognition (OCR) output look for these NEs in the single-best recognition results. Due to the inevitable presence of recognition errors in the single-best output, such approaches usually result in low recall. Given that a lattice is more likely to contain the correct answer, we explore NE detection from word lattices produced by our Arabic handwriting recognition system. Since the improvement in recall is accompanied by a large number of false positives, we use confidence scores based on posterior scores to control precision. We show a 7% improvement in true detects for the same false acceptance rate on using lattices instead of 1-best hypothesis for NE lookup.