Towards the processing of historic documents

Authors:
Björn Gottfried;Lothar Meyer-Lerbs
Affiliations:
Centre for Computing and Communication Technologies, University of Bremen, Germany;Centre for Computing and Communication Technologies, University of Bremen, Germany
Venue:
NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Year:
2009

Citing 7
Cited 0

Query by Image and Video Content: The QBIC System

Computer
Random decision forests

ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 1) - Volume 1
Qualitative similarity measures-The case of two-dimensional outlines

Computer Vision and Image Understanding
SHAPE FROM POSITIONAL-CONTRAST: Characterising Sketches with Qualitative Line Arrangements

SHAPE FROM POSITIONAL-CONTRAST: Characterising Sketches with Qualitative Line Arrangements
A Self-Adaptive Method for Extraction of Document-Specific Alphabets

ICDAR '09 Proceedings of the 2009 10th International Conference on Document Analysis and Recognition
Towards the visualisation of shape features: the scope histogram

KI'06 Proceedings of the 29th annual German conference on Artificial intelligence
Glyph extraction from historic document images

Proceedings of the 10th ACM symposium on Document engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

This chapter describes methods required for transforming complex document images into texts. The goal is to make the contents of those documents available for search engines, which are not born-digital but converted from a physical medium to a digital format. Established optical character recognition methods fail for documents for which no assumptions can be made regarding the, probably unknown, symbols contained in the document, historic documents being the example domain par excellence. This paper, however, has a much broader goal: it outlines fundamental problems as well as a methodology in the dealing with documents containing unknown and arbitrary symbols in order to provide a basis for discussions and future work within the digital library community. In particular, future advances will more closely require the interaction of researchers concerned with such diverse topics as document digitisation, reproduction, and preservation as well as search engines, cross-language processing, mobile libraries, and many further areas. Adopting a general view on the presented issues, researchers of the aforementioned areas should be sensitised for the problems met in processing complex, especially historic documents.