Advances in the BBN BYBLOS OCR System

Authors:
Zhidong Lu;Richard Schwartz;Premkumar Natarajan;Issam Bazzi;John Makhoul
Affiliations:
-;-;-;-;-
Venue:
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Year:
1999

Citing 0
Cited 9

Multilingual machine printed OCR

Hidden Markov models
Named entity extraction from noisy input: speech and OCR

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Performance Improvements to the BBN Byblos OCR System

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Character Duration Modeling for Speed Improvements in the BBN Byblos OCR System

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Adapting the Tesseract open source OCR engine for multilingual OCR

Proceedings of the International Workshop on Multilingual OCR
Recent progress on the OCRopus OCR system

Proceedings of the International Workshop on Multilingual OCR
Efficient search in document image collections

ACCV'07 Proceedings of the 8th Asian conference on Computer vision - Volume Part I
Multi-lingual offline handwriting recognition using hidden Markov models: a script-independent approach

SACH'06 Proceedings of the 2006 conference on Arabic and Chinese handwriting recognition
The BBN document analysis service: a platform for multilingual document translation

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present some recent advances in the BBN BYBLOS OCR system. This OCR system can be used to recognize Arabic, Chinese, and English with high accuracy. A major change in the system is the use of continuous-density HMMs, which allow us to take advantage of large amount of training data and to use unsupervised adaptation methods to improve accuracy in many cases, e.g. on degraded data. Another advance is the substantial increase in recognition speed. With this increased speed, the system is fast enough for practical use on Arabic and English data. The extension of the system to Chinese further demonstrated the language independence of this system and showed that this system can be used on languages with large character sets and complicated character structures. The Chinese OCR system yielded high accuracy on newspaper data.