The book structure extraction competition with the resurgence software for part and chapter detection at Caen university

  • Authors:
  • Emmanuel Giguet;Nadine Lucas

  • Affiliations:
  • GREYC Cnrs, Caen Basse Normandie University, Caen Cedex, France;GREYC Cnrs, Caen Basse Normandie University, Caen Cedex, France

  • Venue:
  • INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The GREYC Island team participated in the Structure Extraction Competition part of the INEX Book track for the second time, with the Resurgence software. We used a minimal strategy primarily based on top-down document representation with two levels, part and chapter. The main idea is to use a model describing relationships for elements in the document structure. Frontiers between high-level units are detected, parts and then chapters. Page is also used. The periphery center relationship is calculated on the entire document and reflected on each page. The strong points of the approach are that it deals with the entire document; it handles books without ToCs, and titles that are not represented in the ToC (e. g. preface); it is not dependent on lexicon, hence tolerant to OCR errors and language independent; it is simple and fast.