An introduction to digital image processing
An introduction to digital image processing
Document digitization lifecycle for complex magazine collection
Proceedings of the 2005 ACM symposium on Document engineering
Semantics-Based Content Extraction in Typewritten Historical Documents
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
A color-based layout analysis to process censorship cards of film archives
ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Hi-index | 0.00 |
This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic document (combining content and semantic information) along with the tools that have been created to realise each stage in the lifecycle. The whole approach is described in the context of different types of typewritten documents relating to prisoners in World-War II concentration camps and is the result of a multinational collaboration under the MEMORIAL project funded (€1.5M) by the European Union (www.memorial-project.info). Extensive tests with historians/archivists and evaluation of the content extraction results indicate the superior performance of the whole semantics-driven approach both over manual transcription and over the semi-automated application of off-the-shelf OCR and the use of a conventional (text and layout) document format.