Error-tolerant finite-state recognition with applications to morphological analysis and spelling correction

Authors:
Kemal Oflazer
Affiliations:
Bilkent University
Venue:
Computational Linguistics
Year:
1996

Citing 17
Cited 49

Morphological parsing and the lexicon

Lexical representation and process
A model and a fast algorithm for multiple errors spelling correction

Acta Informatica
The utilization of fuzzy sets in the recognition of imperfect strings

Fuzzy Sets and Systems
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Deterministic part-of-speech tagging with finite-state transducers

Computational Linguistics
A technique for computer detection and correction of spelling errors

Communications of the ACM
Introduction to Automata Theory, Languages and Computability

Introduction to Automata Theory, Languages and Computability
Natural Language Processing in LISP: An Introduction to Computational Linguistics

Natural Language Processing in LISP: An Introduction to Computational Linguistics
Tagging and morphological disambiguation of Turkish text

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Spelling correction in agglutinative languages

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Ambiguity resolution in a reductionistic parser

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A morphological analysis based method for spelling correction

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Two-level description of Turkish morphology

EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A morphographemic model for error correction in nonconcatenative strings

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Morphosyntactic correction in natural language interfaces

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 2
Constructing lexical transducers

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Two-level morphology with composition

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1

Error-Tolerant Retrieval of Trees

IEEE Transactions on Pattern Analysis and Machine Intelligence
Information retrieval and spelling correction: an inquiry into lexical disambiguation

Proceedings of the 2002 ACM symposium on Applied computing
Defense of the ansatz for dynamical hierarchies

Artificial Life
The combinatory morphemic lexicon

Computational Linguistics
Using Finite State Technology in Natural Language Processing of Basque

CIAA '01 Revised Papers from the 6th International Conference on Implementation and Application of Automata
Using Part-of-Speech and Word-Sense Disambiguation for Boosting String-Edit Distance Spelling Correction

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
Typographical Nearest-Neighbor Search in a Finite-State Lexicon and Its Application to Spelling Correction

CIAA '01 Revised Papers from the 6th International Conference on Implementation and Application of Automata
Combining trigram and automatic weight distribution in Chinese spelling error correction

Journal of Computer Science and Technology
Bootstrapping morphological analyzers by combining human elicitation and machine learning

Computational Linguistics
Lenient morphological analysis

Natural Language Engineering
Dialogue processing in a CALL-system

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Combining trigram and Winnow in thai OCR error correction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Application of analogical modelling to example based machine translation

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Error-tolerant tree matching

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Fast Approximate Search in Large Dictionaries

Computational Linguistics
A unified language model for large vocabulary continuous speech recognition of Turkish

Signal Processing - Fractional calculus applications in signals and systems
FipsOrtho: A spell checker for learners of French

ReCALL
Error correction vs. query garbling for Arabic OCR document retrieval

ACM Transactions on Information Systems (TOIS)
Effect of OCR error correction on Arabic retrieval

Information Retrieval
Corrupted queries in Spanish text retrieval: error correction vs. N-Grams

Proceedings of the 2nd ACM workshop on Improving non english web searching
Ordering the suggestions of a spellchecker without using context*

Natural Language Engineering
A spell checker and corrector for the native South African language, South Sotho

Proceedings of the 2009 Annual Conference of the Southern African Computer Lecturers' Association
Language resources for a network-based dictionary

ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries
Arabic OCR error correction using character segment correction, language modeling, and shallow morphology

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Hunmorph: open source word analysis

Software '05 Proceedings of the Workshop on Software
A stochastic finite-state morphological parser for Turkish

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A multiple classifier approach to detect Chinese character recognition errors

Pattern Recognition
Contextual spelling correction

EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
Statistical machine translation into a morphologically complex language

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
Exploiting morphology and local word reordering in English-to-Turkish phrase-based statistical machine translation

IEEE Transactions on Audio, Speech, and Language Processing
Eyes-free text entry with error correction on touchscreen mobile devices

Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries
Comparing canonicalizations of historical German text

SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
Deciding word neighborhood with universal neighborhood automata

Theoretical Computer Science
Efficiently generating correction suggestions for garbled tokens of historical language

Natural Language Engineering
Using deep morphology to improve automatic error detection in Arabic handwriting recognition

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
On Morphological Analysis for Learner Language, Focusing on Russian

Research on Language and Computation
Computation of similarity: similarity search as computation

CiE'11 Proceedings of the 7th conference on Models of computation in context: computability in Europe
Enhancing trie-based syntactic pattern recognition using AI heuristic search strategies

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Word-Based correction for retrieval of arabic OCR degraded documents

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Regional vs. global robust spelling correction

CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Regional versus global finite-state error repair

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Spelling correction on technical documents

EUROCAST'05 Proceedings of the 10th international conference on Computer Aided Systems Theory
Regional finite-state error repair

CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
Robust spelling correction

CIAA'05 Proceedings of the 10th international conference on Implementation and Application of Automata
Using lexical disambiguation and named-entity recognition to improve spelling correction in the electronic patient record

Artificial Intelligence in Medicine
Finite state tools for natural language processing

Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems
Effectiveness of an implementation method for retrieving similar strings by trie structures

International Journal of Computer Applications in Technology
Standardization problem of author affiliations in citation indexes

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the notion of error-tolerant recognition with finite-state recognizers along with results from some applications. Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite-state recognizer. Such recognition has applications to error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: in the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of the agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology has been fully captured by a single (and possibly very large) finite-state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate candidate correct forms from a given misspelled string within a certain edit distance. Error-tolerant recognition can be applied to spelling correction for any language, if (a) it has a word list comprising all inflected forms, or (b) its morphology has been fully described by a finite-state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages (English, Dutch, French, German, and Italian, among others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with an edit distance of 1) on a SPARCStation 10/41. For spelling correction in Turkish, error-tolerant recognition operating with a (circular) recognizer of Turkish words (with about 29,000 states and 119,000 transitions) can generate all candidate words in less than 20 milliseconds, with an edit distance of 1.