PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Mining Text Using Keyword Distributions
Journal of Intelligent Information Systems
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data
OCR post-processing for low density languages
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Optical character recognition errors and their effects on natural language processing
Proceedings of the second workshop on Analytics for noisy unstructured text data
Hi-index | 0.00 |
Paper guides and reference books in the fields of Pharmacology, Veterinary and Crops Protection are often presented in the form of semi-structured text data. "Key words", for instance, the names of diseases and drugs, and relationships between them are of a great importance for obtaining the useful information -- advice, instructions, etc. The definition of relationships is significant problem when the aim is to transform relatively big amount semi-structured text data into intelligent computer based system. The paper shortly presents the OCR errors detection and correction in the process of transformation of Bulgarian crops protection reference book into relational database. Finally, this solution leads to substantial change in the form of the data presentation and access. This does not change the essence of the data itself.