Document Layout and Reading Sequence Analysis by Extended Split Detection Method

  • Authors:
  • Noboru Nakajima;Keiji Yamada;Jun Tsukumo

  • Affiliations:
  • -;-;-

  • Venue:
  • DAS '98 Selected Papers from the Third IAPR Workshop on Document Analysis Systems: Theory and Practice
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an Extended Split Detection Method that can hierarchically segment a machine-printed page image with a complex layout into smaller layout elements. The method performs piecewise-linear segmentation using many kinds of separator elements such as field separators, lines, edges of figures, and edges of white background areas. Furthermore, this method represents an analyzed layout of a hierarchical structure in a tree data structure, in which all nodes are traversed according to the simple rules for generating the reading sequence. We demonstrated that the new method increases the correct character line segmentation rate by 15.5%, to 95.5%, and we achieved a correct reading sequence generation of 88.1%.