Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
A word shape analysis approach to lexicon based word recognition
Pattern Recognition Letters
Lexical postprocessing by heuristic search and automatic determination of the edit costs
ICDAR '95 Proceedings of the Third International Conference on Document Analysis and Recognition (Volume 2) - Volume 2
Orthographic Errors in Web Pages: Toward Cleaner Web Corpora
Computational Linguistics
Adaptive text correction with Web-crawled domain-dependent dictionaries
ACM Transactions on Speech and Language Processing (TSLP)
Efficient dictionary-based text rewriting using subsequential transducers†
Natural Language Engineering
Hi-index | 0.00 |
Postcorrection of OCR-results for text documents is usuallybased on electronic dictionaries. When scanning textsfrom a specific thematic area, conventional dictionaries oftenmiss a considerable number of tokens. Furthermore,if word frequencies are stored with the entries, these frequencieswill not properly reflect the frequencies found inthe given thematic area. Correction adequacy suffers fromthese two shortcomings. We report on a series of experimentswhere we compare (1) the use of fixed, static large-scaledictionaries (including proper names and abbreviations)with (2) the use of dynamic dictionaries retrieved viaan automated analysis of the vocabulary of web pages froma given domain, and (3) the use of mixed dictionaries. Ourexperiments, which address English and German documentcollections from a variety of fields, show that dynamic dictionariesof the above mentioned form can improve the coveragefor the given thematic area in a significant way andhelp to improve the quality of lexical postcorrection methods.