Accessing the content of Greek historical documents

Authors:
Anastasios Kesidis;Eleni Galiotou;Basilis Gatos;Aristomenis Lampropoulos;Ioannis Pratikakis;Ioanna Manolessou;Angela Ralli
Affiliations:
National Center for Scientific Research, "Demokritos", Agia Paraskevi, Athens, Greece;Technological Educational Institution of Athens, Athens, Greece;National Center for Scientific Research, "Demokritos", Agia Paraskevi, Athens, Greece;University of Piraeus, Piraeus, Greece;National Center for Scientific Research, "Demokritos", Agia Paraskevi, Athens, Greece;University of Patras, Patras, Greece;University of Patras, Patras, Greece
Venue:
Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
Year:
2009

Citing 12
Cited 0

Word spotting: indexing handwritten manuscripts

Intelligent multimedia information retrieval
HMM Word Recognition Engine

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
The Detection of Duplicates in Document Image Databases

ICDAR '97 Proceedings of the 4th International Conference on Document Analysis and Recognition
Keyword Spotting for Cursive Document Retrieval

DIA '97 Proceedings of the 1997 Workshop on Document Image Analysis
An Approach to Word Image Matching Based on Weighted Hausforff Distance

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Features for Word Spotting in Historical Manuscripts

ICDAR '03 Proceedings of the Seventh International Conference on Document Analysis and Recognition - Volume 1
Introduction to the special issue on finite-state methods in NLP

Computational Linguistics - Special issue on finite-state methods in NLP
Adapting a synonym database to specific domains

RANLPIR '00 Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Adaptive degraded document image binarization

Pattern Recognition
Keyword-guided word spotting in historical printed documents using synthetic data and user feedback

International Journal on Document Analysis and Recognition
Automatic table detection in document images

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose an alternative method for accessing the content of Greek historical documents printed during the 17th and 18th centuries by searching words directly in digitized documents based on word spotting, without the use of an optical character recognition engine. We describe a methodology according to which synthetic word images are created from keywords. These images are compared to all the words in the digitized documents while user feedback is used in order to refine the search procedure. In order to improve the efficiency of accessing and searching, we have used natural language processing techniques that comprise (i) a morphological generator for early Modern Greek which provides the users with the ability to search documents using only a word stem and locate all the corresponding inflected word forms and (ii) a synonym dictionary which facilitates access to the semantic context of documents and enriches the results of the search process.