Reflections on the INEX structure extraction competition
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
An experimental workflow development platform for historical document digitisation and analysis
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
GROTOAP: ground truth for open access publications
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
The IMPACT dataset of historical document images
Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
There is a significant need for a realistic dataset on which to evaluate layout analysis methods and examine their performance in detail. This paper presents a new dataset (and the methodology used to create it) based on a wide range of contemporary documents. Strong emphasis is placed on comprehensive and detailed representation of both complex and simple layouts, and on colour originals. In-depth information is recorded both at the page and region level. Ground truth is efficiently created using a new semi-automated tool and stored in a new comprehensive XML representation, the PAGE format. The dataset can be browsed and searched via a web-based front end to the underlying database and suitable subsets (relevant to specific evaluation goals) can be selected and downloaded.