Robust named entity detection using an Arabic offline handwriting recognition system

Authors:
Krishna Subramanian;Rohit Prasad;Prem Natarajan
Affiliations:
BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA;BBN Technologies, Cambridge, MA
Venue:
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Year:
2009

Citing 5
Cited 0

Probabilistic Model for Segmentation Based Word Recognition with Lexicon

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Script-Independent, HMM-Based Text Line Finding for OCR

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Nymble: a high-performance learning name-finder

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Multi-lingual offline handwriting recognition using hidden Markov models: a script-independent approach

SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text from Arabic optical handwriting recognition (OHR) systems can provide key indexing information. In particular, the text is rich in named entities (NEs) and detection of such entities is critical for search applications. Traditional approaches for detecting NEs in optical character recognition (OCR) output look for these NEs in the single-best recognition results. Due to the inevitable presence of recognition errors in the single-best output, such approaches usually result in low recall. Given that a lattice is more likely to contain the correct answer, we explore NE detection from word lattices produced by our Arabic handwriting recognition system. Since the improvement in recall is accompanied by a large number of false positives, we use confidence scores based on posterior scores to control precision. We show a 7% improvement in true detects for the same false acceptance rate on using lattices instead of 1-best hypothesis for NE lookup.