The Document Spectrum for Page Layout Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
A systematic comparison of various statistical alignment models
Computational Linguistics
Advances in the BBN BYBLOS OCR System
ICDAR '99 Proceedings of the Fifth International Conference on Document Analysis and Recognition
Script-Independent, HMM-Based Text Line Finding for OCR
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 4
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Multi-scale Techniques for Document Page Segmentation
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Hierarchical Phrase-Based Translation
Computational Linguistics
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 01
End-to-End Trainable Thai OCR System Using Hidden Markov Models
DAS '08 Proceedings of the 2008 The Eighth IAPR International Workshop on Document Analysis Systems
Hi-index | 0.00 |
In this paper, we introduce a new operational platform for end-to-end document image analysis, recognition, and machine translation. The Raytheon BBN Document Analysis Service (BBN DAS) performs the following operations on scanned machine-print document images: (1) image pre-processing and segmentation to identify homogenous zones of text, (2) text recognition to convert the text zones into electronic text, (3) machine translation for converting the text from the native language of the document into English, and (4) document archiving and indexing for effective content-based search. BBN DAS uses a service-oriented architecture (SOA), which offers modularity and scalability for operation on hardware configurations ranging from a laptop to distributed multi-node server environments. This paper describes the platform architecture, the process of configuring it for Arabic newsprint documents and resulting performance results of the Arabic system.