A Fast Algorithm for Bottom-Up Document Layout Analysis

  • Authors:
  • Anikó Simon;Jean-Christophe Pret;A. Peter Johnson

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • Year:
  • 1997

Quantified Score

Hi-index 0.14

Visualization

Abstract

This paper describes a new bottom-up method for document layout analysis. The algorithm was implemented in the CLiDE (Chemical Literature Data Extraction) system (http://chem.leeds.ac.uk/ICAMS/CLiDE.html), but the method described here is suitable for a broader range of documents. It is based on Kruskal's algorithm and uses a special distance-metric between the components to construct the physical page structure. The method has all the major advantages of bottom-up systems: independence from different text spacing and independence from different block alignments. The algorithms computational complexity is reduced to linear by using heuristics and path-compression.