Foundations of statistical natural language processing
Foundations of statistical natural language processing
Contextual spelling correction using latent semantic analysis
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Similarity-based methods for word sense disambiguation
ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Correcting real-word spelling errors by restoring lexical cohesion
Natural Language Engineering
A generative probabilistic OCR model for NLP applications
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Speech-based retrieval using semantic co-occurrence filtering
HLT '94 Proceedings of the workshop on Human Language Technology
Application of syntactic properties to three-level recognition of polish hand-written medical texts
Proceedings of the 2006 ACM symposium on Document engineering
Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM'06 Conference held in Ustron, Poland, June 19-22, 2006 (Advances in Soft Computing)
Semantic similarity for detecting recognition errors in automatic speech transcripts
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Semantic similarity measure of polish nouns based on linguistic features
BIS'07 Proceedings of the 10th international conference on Business information systems
Automatic selection of heterogeneous syntactic features in semantic similarity of polish nouns
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Language modelling for the needs of OCR of medical texts
ISBMDA'06 Proceedings of the 7th international conference on Biological and Medical Data Analysis
Data mining medieval documents by word spotting
Proceedings of the 2011 Workshop on Historical Document Imaging and Processing
Hi-index | 0.00 |
In the paper a method of the correction of handwriting Optical Character Recognition (OCR) based on the semantic similarity is presented. Different versions of the extraction of semantic similarity measures from a corpus are analysed, with the best results achieved for the combination of the text window context and Rank Weight Function. An algorithm of the word sequence selection with the high internal similarity is proposed. The method was trained and applied to a corpus of real medical documents written in Polish.