Book Layout Analysis: TOC Structure Extraction Engine

  • Authors:
  • Bodin Dresevic;Aleksandar Uzelac;Bogdan Radakovic;Nikola Todic

  • Affiliations:
  • Microsoft Development Center Serbia, Belgrade, Serbia 11000;Microsoft Development Center Serbia, Belgrade, Serbia 11000;Microsoft Development Center Serbia, Belgrade, Serbia 11000;Microsoft Development Center Serbia, Belgrade, Serbia 11000

  • Venue:
  • Advances in Focused Retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scanned then OCRed documents usually lack detailed layout and structural information. We present a book specific layout analysis system used to extract TOC structure information from the scanned and OCRed books. This system was used for navigation purposes by the live books search project. We provide labeling scheme for the TOC sections of the books, high level overview for the book layout analysis system, as well as TOC Structure Extraction Engine. In the end we present accuracy measurements of this system on a representative test set.