Morphological parsing and the lexicon
Lexical representation and process
A model and a fast algorithm for multiple errors spelling correction
Acta Informatica
The utilization of fuzzy sets in the recognition of imperfect strings
Fuzzy Sets and Systems
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Deterministic part-of-speech tagging with finite-state transducers
Computational Linguistics
A technique for computer detection and correction of spelling errors
Communications of the ACM
Introduction to Automata Theory, Languages and Computability
Introduction to Automata Theory, Languages and Computability
Natural Language Processing in LISP: An Introduction to Computational Linguistics
Natural Language Processing in LISP: An Introduction to Computational Linguistics
Tagging and morphological disambiguation of Turkish text
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Spelling correction in agglutinative languages
ANLC '94 Proceedings of the fourth conference on Applied natural language processing
Ambiguity resolution in a reductionistic parser
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A morphological analysis based method for spelling correction
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
Two-level description of Turkish morphology
EACL '93 Proceedings of the sixth conference on European chapter of the Association for Computational Linguistics
A morphographemic model for error correction in nonconcatenative strings
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Morphosyntactic correction in natural language interfaces
COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 2
Constructing lexical transducers
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Two-level morphology with composition
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 1
Error-Tolerant Retrieval of Trees
IEEE Transactions on Pattern Analysis and Machine Intelligence
Information retrieval and spelling correction: an inquiry into lexical disambiguation
Proceedings of the 2002 ACM symposium on Applied computing
Defense of the ansatz for dynamical hierarchies
Artificial Life
The combinatory morphemic lexicon
Computational Linguistics
Using Finite State Technology in Natural Language Processing of Basque
CIAA '01 Revised Papers from the 6th International Conference on Implementation and Application of Automata
AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
CIAA '01 Revised Papers from the 6th International Conference on Implementation and Application of Automata
Combining trigram and automatic weight distribution in Chinese spelling error correction
Journal of Computer Science and Technology
Bootstrapping morphological analyzers by combining human elicitation and machine learning
Computational Linguistics
Lenient morphological analysis
Natural Language Engineering
Dialogue processing in a CALL-system
EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
Combining trigram and Winnow in thai OCR error correction
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Application of analogical modelling to example based machine translation
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Fast Approximate Search in Large Dictionaries
Computational Linguistics
A unified language model for large vocabulary continuous speech recognition of Turkish
Signal Processing - Fractional calculus applications in signals and systems
Error correction vs. query garbling for Arabic OCR document retrieval
ACM Transactions on Information Systems (TOIS)
Effect of OCR error correction on Arabic retrieval
Information Retrieval
Corrupted queries in Spanish text retrieval: error correction vs. N-Grams
Proceedings of the 2nd ACM workshop on Improving non english web searching
Ordering the suggestions of a spellchecker without using context*
Natural Language Engineering
A spell checker and corrector for the native South African language, South Sotho
Proceedings of the 2009 Annual Conference of the Southern African Computer Lecturers' Association
Language resources for a network-based dictionary
ElectricDict '04 Proceedings of the Workshop on Enhancing and Using Electronic Dictionaries
EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Hunmorph: open source word analysis
Software '05 Proceedings of the Workshop on Software
A stochastic finite-state morphological parser for Turkish
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
A multiple classifier approach to detect Chinese character recognition errors
Pattern Recognition
Contextual spelling correction
EUROCAST'07 Proceedings of the 11th international conference on Computer aided systems theory
Statistical machine translation into a morphologically complex language
CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
IEEE Transactions on Audio, Speech, and Language Processing
Eyes-free text entry with error correction on touchscreen mobile devices
Proceedings of the 6th Nordic Conference on Human-Computer Interaction: Extending Boundaries
Comparing canonicalizations of historical German text
SIGMORPHON '10 Proceedings of the 11th Meeting of the ACL Special Interest Group on Computational Morphology and Phonology
Deciding word neighborhood with universal neighborhood automata
Theoretical Computer Science
Efficiently generating correction suggestions for garbled tokens of historical language
Natural Language Engineering
Using deep morphology to improve automatic error detection in Arabic handwriting recognition
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
On Morphological Analysis for Learner Language, Focusing on Russian
Research on Language and Computation
Computation of similarity: similarity search as computation
CiE'11 Proceedings of the 7th conference on Models of computation in context: computability in Europe
Enhancing trie-based syntactic pattern recognition using AI heuristic search strategies
ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Word-Based correction for retrieval of arabic OCR degraded documents
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Regional vs. global robust spelling correction
CICLing'06 Proceedings of the 7th international conference on Computational Linguistics and Intelligent Text Processing
Regional versus global finite-state error repair
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Spelling correction on technical documents
EUROCAST'05 Proceedings of the 10th international conference on Computer Aided Systems Theory
Regional finite-state error repair
CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata
CIAA'05 Proceedings of the 10th international conference on Implementation and Application of Automata
Artificial Intelligence in Medicine
Finite state tools for natural language processing
Proceedings of the COLING-2000 Workshop on Using Toolsets and Architectures To Build NLP Systems
Effectiveness of an implementation method for retrieving similar strings by trie structures
International Journal of Computer Applications in Technology
Hi-index | 0.00 |
This paper presents the notion of error-tolerant recognition with finite-state recognizers along with results from some applications. Error-tolerant recognition enables the recognition of strings that deviate mildly from any string in the regular set recognized by the underlying finite-state recognizer. Such recognition has applications to error-tolerant morphological processing, spelling correction, and approximate string matching in information retrieval. After a description of the concepts and algorithms involved, we give examples from two applications: in the context of morphological analysis, error-tolerant recognition allows misspelled input word forms to be corrected and morphologically analyzed concurrently. We present an application of this to error-tolerant analysis of the agglutinative morphology of Turkish words. The algorithm can be applied to morphological analysis of any language whose morphology has been fully captured by a single (and possibly very large) finite-state transducer, regardless of the word formation processes and morphographemic phenomena involved. In the context of spelling correction, error-tolerant recognition can be used to enumerate candidate correct forms from a given misspelled string within a certain edit distance. Error-tolerant recognition can be applied to spelling correction for any language, if (a) it has a word list comprising all inflected forms, or (b) its morphology has been fully described by a finite-state transducer. We present experimental results for spelling correction for a number of languages. These results indicate that such recognition works very efficiently for candidate generation in spelling correction for many European languages (English, Dutch, French, German, and Italian, among others) with very large word lists of root and inflected forms (some containing well over 200,000 forms), generating all candidate solutions within 10 to 45 milliseconds (with an edit distance of 1) on a SPARCStation 10/41. For spelling correction in Turkish, error-tolerant recognition operating with a (circular) recognizer of Turkish words (with about 29,000 states and 119,000 transitions) can generate all candidate words in less than 20 milliseconds, with an edit distance of 1.