Textual indexation of ancient documents

Authors:
Yann Leydier;Frank LeBourgeois;Hubert Emptoz
Affiliations:
Archimed Lyon, Villeurbanne, France;LIRIS, Villeurbanne Cedex, France;LIRIS, Villeurbanne Cedex, France
Venue:
Proceedings of the 2005 ACM symposium on Document engineering
Year:
2005

Citing 1
Cited 4

Networking Digital Document Images

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition

A new generation of textual corpora: mining corpora from very large collections

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Towards an omnilingual word retrieval system for ancient manuscripts

Pattern Recognition
Improving OCR accuracy for classical critical editions

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
A Document Image Retrieval System

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the past years many levels of indexation have been developped to allow a fast retrieval of digitized documents. Among all the ways of indexing a document, textual indexation allows the finest querries on a the documents' content. Usually, the plain text transcription of a digitized document is obtained by applying an OCR (Optical Character Recognition) software on it. What if the OCR fails? Indeed OCR systems are inefficient on low-quality printed documents, and are unsuited to the processing of ancient fonts. Furthermore, OCR is not applicable to manuscript text recognition. In this paper we introduce two alternative methods of accessing to text trough the image: the Computer Assisted Transcription and the Word Spotting.