The book structure extraction competition with the resurgence software for part and chapter detection at Caen university

Authors:
Emmanuel Giguet;Nadine Lucas
Affiliations:
GREYC Cnrs, Caen Basse Normandie University, Caen Cedex, France;GREYC Cnrs, Caen Basse Normandie University, Caen Cedex, France
Venue:
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Year:
2010

Citing 6
Cited 1

Google Book Search: Document Understanding on a Massive Scale

ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
ICDAR 2009 Book Structure Extraction Competition

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Reflections on the INEX structure extraction competition

DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Overview of the INEX 2009 book track

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
The book structure extraction competition with the resurgence software at Caen university

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Setting up a competition framework for the evaluation of structure extraction from OCR-ed books

International Journal on Document Analysis and Recognition - Special Issue on Performance Evaluation

Overview of the INEX 2010 book track: scaling up the evaluation using crowdsourcing

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The GREYC Island team participated in the Structure Extraction Competition part of the INEX Book track for the second time, with the Resurgence software. We used a minimal strategy primarily based on top-down document representation with two levels, part and chapter. The main idea is to use a model describing relationships for elements in the document structure. Frontiers between high-level units are detected, parts and then chapters. Page is also used. The periphery center relationship is calculated on the entire document and reflected on each page. The strong points of the approach are that it deals with the entire document; it handles books without ToCs, and titles that are not represented in the ToC (e. g. preface); it is not dependent on lexicon, hence tolerant to OCR errors and language independent; it is simple and fast.