Performance comparison of six algorithms for page segmentation

  • Authors:
  • Faisal Shafait;Daniel Keysers;Thomas M. Breuel

  • Affiliations:
  • Image Understanding and Pattern Recognition (IUPR) research group, German Research Center for Artificial Intelligence (DFKI), and Technical University of Kaiserslautern, Kaiserslautern, Germany;Image Understanding and Pattern Recognition (IUPR) research group, German Research Center for Artificial Intelligence (DFKI), and Technical University of Kaiserslautern, Kaiserslautern, Germany;Image Understanding and Pattern Recognition (IUPR) research group, German Research Center for Artificial Intelligence (DFKI), and Technical University of Kaiserslautern, Kaiserslautern, Germany

  • Venue:
  • DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a quantitative comparison of six algorithms for page segmentation: X-Y cut, smearing, whitespace analysis, constrained text-line finding, Docstrum, and Voronoi-diagram-based. The evaluation is performed using a subset of the UW-III collection commonly used for evaluation, with a separate training set for parameter optimization. We compare the results using both default parameters and optimized parameters. In the course of the evaluation, the strengths and weaknesses of each algorithm are analyzed, and it is shown that no single algorithm outperforms all other algorithms. However, we observe that the three best-performing algorithms are those based on constrained text-line finding, Docstrum, and the Voronoi-diagram.