Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Programming pearls: a spelling checker
Communications of the ACM
Improved Boosting Algorithms Using Confidence-rated Predictions
Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Corpus-based statistical screening for content-bearing terms
Journal of the American Society for Information Science and Technology
A technique for computer detection and correction of spelling errors
Communications of the ACM
Text Categorization Based on Regularized Linear Classification Methods
Information Retrieval
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Cooperative error handling and shallow processing
EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Spelling correction using context
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Combining Trigram-based and feature-based methods for context-sensitive spelling correction
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Estimators for stochastic "Unification-Based" grammars
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Fast error-tolerant search on very large texts
Proceedings of the 2009 ACM symposium on Applied Computing
Exploiting extremely rare features in text categorization
ECML'06 Proceedings of the 17th European conference on Machine Learning
Hi-index | 0.00 |
MEDLINE® is a collection of more than 12 million references and abstracts covering recent life science literature. With its continued growth and cutting-edge terminology, spell-checking with a traditional lexicon based approach requires significant additional manual followup. In this work, an internal corpus based context quality rating α, frequency, and simple misspelling transformations are used to rank words from most likely to be misspellings to least likely. Eleven-point average precisions of 0.891 have been achieved within a class of 42,340 all alphabetic words having an α score less than 10. Our models predict that 16,274 or 38% of these words are misspellings. Based on test data, this result has a recall of 79% and a precision of 86%. In other words, spell checking can be done by statistics instead of with a dictionary. As an application we examine the time history of low α words in MEDLINE® titles and abstracts.