Probabilistic Model for Segmentation Based Word Recognition with Lexicon
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Script-Independent, HMM-Based Text Line Finding for OCR
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
Nymble: a high-performance learning name-finder
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Improvements in BBN's HMM-Based Offline Arabic Handwriting Recognition System
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition
Hi-index | 0.00 |
Text from Arabic optical handwriting recognition (OHR) systems can provide key indexing information. In particular, the text is rich in named entities (NEs) and detection of such entities is critical for search applications. Traditional approaches for detecting NEs in optical character recognition (OCR) output look for these NEs in the single-best recognition results. Due to the inevitable presence of recognition errors in the single-best output, such approaches usually result in low recall. Given that a lattice is more likely to contain the correct answer, we explore NE detection from word lattices produced by our Arabic handwriting recognition system. Since the improvement in recall is accompanied by a large number of false positives, we use confidence scores based on posterior scores to control precision. We show a 7% improvement in true detects for the same false acceptance rate on using lattices instead of 1-best hypothesis for NE lookup.