Google Book Search: Document Understanding on a Massive Scale
ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02
ICDAR 2009 Book Structure Extraction Competition
ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Reflections on the INEX structure extraction competition
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Overview of the INEX 2009 book track
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
The book structure extraction competition with the resurgence software at Caen university
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Setting up a competition framework for the evaluation of structure extraction from OCR-ed books
International Journal on Document Analysis and Recognition - Special Issue on Performance Evaluation
Overview of the INEX 2010 book track: scaling up the evaluation using crowdsourcing
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Hi-index | 0.00 |
The GREYC Island team participated in the Structure Extraction Competition part of the INEX Book track for the second time, with the Resurgence software. We used a minimal strategy primarily based on top-down document representation with two levels, part and chapter. The main idea is to use a model describing relationships for elements in the document structure. Frontiers between high-level units are detected, parts and then chapters. Page is also used. The periphery center relationship is calculated on the entire document and reflected on each page. The strong points of the approach are that it deals with the entire document; it handles books without ToCs, and titles that are not represented in the ToC (e. g. preface); it is not dependent on lexicon, hence tolerant to OCR errors and language independent; it is simple and fast.