Correction of medical handwriting OCR based on semantic similarity

Authors:
Bartosz Broda;Maciej Piasecki
Affiliations:
Institute of Applied Informatics, Wrocław University of Technology, Poland;Institute of Applied Informatics, Wrocław University of Technology, Poland
Venue:
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Year:
2007

Citing 12
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Contextual spelling correction using latent semantic analysis

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Similarity-based methods for word sense disambiguation

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Correcting real-word spelling errors by restoring lexical cohesion

Natural Language Engineering
A generative probabilistic OCR model for NLP applications

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Speech-based retrieval using semantic co-occurrence filtering

HLT '94 Proceedings of the workshop on Human Language Technology
Application of syntactic properties to three-level recognition of polish hand-written medical texts

Proceedings of the 2006 ACM symposium on Document engineering
Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM'06 Conference held in Ustron, Poland, June 19-22, 2006 (Advances in Soft Computing)

Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM'06 Conference held in Ustron, Poland, June 19-22, 2006 (Advances in Soft Computing)
Semantic similarity for detecting recognition errors in automatic speech transcripts

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Semantic similarity measure of polish nouns based on linguistic features

BIS'07 Proceedings of the 10th international conference on Business information systems
Automatic selection of heterogeneous syntactic features in semantic similarity of polish nouns

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Language modelling for the needs of OCR of medical texts

ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis

Data mining medieval documents by word spotting

Proceedings of the 2011 Workshop on Historical Document Imaging and Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the paper a method of the correction of handwriting Optical Character Recognition (OCR) based on the semantic similarity is presented. Different versions of the extraction of semantic similarity measures from a corpus are analysed, with the best results achieved for the combination of the text window context and Rank Weight Function. An algorithm of the word sequence selection with the high internal similarity is proposed. The method was trained and applied to a corpus of real medical documents written in Polish.