Accessing the content of Greek historical documents

  • Authors:
  • Anastasios Kesidis;Eleni Galiotou;Basilis Gatos;Aristomenis Lampropoulos;Ioannis Pratikakis;Ioanna Manolessou;Angela Ralli

  • Affiliations:
  • National Center for Scientific Research, "Demokritos", Agia Paraskevi, Athens, Greece;Technological Educational Institution of Athens, Athens, Greece;National Center for Scientific Research, "Demokritos", Agia Paraskevi, Athens, Greece;University of Piraeus, Piraeus, Greece;National Center for Scientific Research, "Demokritos", Agia Paraskevi, Athens, Greece;University of Patras, Patras, Greece;University of Patras, Patras, Greece

  • Venue:
  • Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose an alternative method for accessing the content of Greek historical documents printed during the 17th and 18th centuries by searching words directly in digitized documents based on word spotting, without the use of an optical character recognition engine. We describe a methodology according to which synthetic word images are created from keywords. These images are compared to all the words in the digitized documents while user feedback is used in order to refine the search procedure. In order to improve the efficiency of accessing and searching, we have used natural language processing techniques that comprise (i) a morphological generator for early Modern Greek which provides the users with the ability to search documents using only a word stem and locate all the corresponding inflected word forms and (ii) a synonym dictionary which facilitates access to the semantic context of documents and enriches the results of the search process.