Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
A technique for computer detection and correction of spelling errors
Communications of the ACM
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
An improved error model for noisy channel spelling correction
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Text induced spelling correction
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Hi-index | 0.00 |
We present TISC, a multilingual, language-independent and context-sensitive spelling checking and correction system designed to facilitate the automatic removal of non-word spelling errors in large corpora. Its lexicon is derived from raw text corpora, without supervision, and contains word unigrams and word bigrams. The system employs input context and lexicon evidence to automatically propose a limited number of ranked correction candidates. We describe the implemented trilingual (Dutch, English, French) prototype and evaluate it on English and Dutch text, monolingual and mixed, containing real-world errors in context.