Effect of OCR-errors on the transformation of semi-structured text data into relational database

  • Authors:
  • Kolyo Z. Onkov

  • Affiliations:
  • Agricultural University, Plovdiv, Bulgaria

  • Venue:
  • Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Paper guides and reference books in the fields of Pharmacology, Veterinary and Crops Protection are often presented in the form of semi-structured text data. "Key words", for instance, the names of diseases and drugs, and relationships between them are of a great importance for obtaining the useful information -- advice, instructions, etc. The definition of relationships is significant problem when the aim is to transform relatively big amount semi-structured text data into intelligent computer based system. The paper shortly presents the OCR errors detection and correction in the process of transformation of Bulgarian crops protection reference book into relational database. Finally, this solution leads to substantial change in the form of the data presentation and access. This does not change the essence of the data itself.