BigBatch: a document processing platform for clusters and grids

  • Authors:
  • Giorgia Mattos;Rafael Dueire Lins;Andrei de Araújo Formiga;Fernando Mário Junqueira Martins

  • Affiliations:
  • Universidade Federal de Pernambuco, Recife, PE, Brazil;Universidade Federal de Pernambuco, Recife, PE, Brazil;Universidade Federal de Pernambuco, Recife, PE, Brazil;Universidade do Minho, Braga, Portugal

  • Venue:
  • Proceedings of the 2008 ACM symposium on Applied computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

BigBatch is an image processing environment designed to process batches of thousands of monochromatic documents. One of the flexibilities and pioneer aspects of BigBatch is offering the possibility of working in distributed environments such as clusters and grids. This paper presents the BigBatch tool and the results of a comparative analysis between cluster and grid configurations. The results obtained show almost no difference in total execution times, indicating that performance is not a primary criterion for choosing between the use of a cluster or a grid. However, there are other, qualitative, aspects that may impact this choice. This paper also considers these aspects and provides a general picture of how to successfully use BigBatch to process document images employing many computers for this task.