The BBN document analysis service: a platform for multilingual document translation

  • Authors:
  • Ehry MacRostie;Rohit Prasad;Stephen Rawls;Matin Kamali;Huaigu Cao;Krishna Subramanian;Prem Natarajan

  • Affiliations:
  • Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA;Raytheon BBN Technologies, Cambridge, MA

  • Venue:
  • DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce a new operational platform for end-to-end document image analysis, recognition, and machine translation. The Raytheon BBN Document Analysis Service (BBN DAS) performs the following operations on scanned machine-print document images: (1) image pre-processing and segmentation to identify homogenous zones of text, (2) text recognition to convert the text zones into electronic text, (3) machine translation for converting the text from the native language of the document into English, and (4) document archiving and indexing for effective content-based search. BBN DAS uses a service-oriented architecture (SOA), which offers modularity and scalability for operation on hardware configurations ranging from a laptop to distributed multi-node server environments. This paper describes the platform architecture, the process of configuring it for Arabic newsprint documents and resulting performance results of the Arabic system.