Text block geometric shape analysis

  • Authors:
  • Hui Chao

  • Affiliations:
  • Hewlett-Packard Laboratories

  • Venue:
  • Proceedings of the 2006 ACM symposium on Document engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

When graphic artist designs a page, they envision a set of text blocks of arbitrary shapes constrained by page size, image blocks and graphics blocks with wrap around properties. We call this the intended shape. What is seen on an actual page depends on the particular text content and typographical constrains such as natural text line breaking and justification. We call this the apparent shape. Our goal is to create document templates by extracting the text blocks' intended shapes from the apparent shapes. The main difficulty is when the line justification is jagged the intended block shape is obfuscated. We solve this problem by analyzing the layout relation of all blocks on a page and applying an iterative process to find the maximum likelihood of the intended shapes.