Dominant color segmentation of administrative document images by hierarchical clustering

  • Authors:
  • Elodie Carel;Vincent Courboulay;Jean-Christophe Burie;Jean-Marc Ogier

  • Affiliations:
  • L3i, University of La Rochelle, La Rochelle, France;L3i, University of La Rochelle, La Rochelle, France;L3i, University of La Rochelle, La Rochelle, France;L3i, University of La Rochelle, La Rochelle, France

  • Venue:
  • Proceedings of the 2013 ACM symposium on Document engineering
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of color documents images segmentation in an industrial context. Automated Document Recognition (ADR) systems highly reduce time and resource costs of companies by managing their huge amount of administrative documents, and by optimizing their workflow. Most of the time, a binarization is performed due to their historical industrial process. Therefore, colorimetric information can improve the process. In this paper, we propose a hierarchical clustering based approach to extract dominant color masks of documents. Indeed, our dataset comprises different kind of scanned administrative document images such as invoices, forms, letters, and so on. We do not know a priori the number of dominant colors on our documents. These masks will further feed the inputs to an OCR in order to bring extra-information about the colorimetric context. This approach requires neither user interaction nor setting steps. Experiments on several types of documents show the relevance of the proposed approach