Categorizing Unknown Words: A Decision Tree-Based Misspelling Identifier

Authors:
Janine Toole
Affiliations:
-
Venue:
AI '99 Proceedings of the 12th Australian Joint Conference on Artificial Intelligence: Advanced Topics in Artificial Intelligence
Year:
1999

Citing 14
Cited 0

Spelling checkers,spelling correctors and the misspellings of poor spellers

Information Processing and Management: an International Journal
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
A time-efficient, linear-space local similarity algorithm

Advances in Applied Mathematics
Internal and external evidence in the identification and semantic categorization of proper names

Corpus processing for lexical acquisition
Identifying unknown proper names in newswire text

Corpus processing for lexical acquisition
Predictive data mining: a practical guide

Predictive data mining: a practical guide
A technique for computer detection and correction of spelling errors

Communications of the ACM
Hierarchical and integrated error recovery based on bidirectional chart parsing technique

Hierarchical and integrated error recovery based on bidirectional chart parsing technique
Detecting and correcting morpho-syntactic errors in real texts

ANLC '92 Proceedings of the third conference on Applied natural language processing
The NOMAD system: expectation-based detection and correction of errors during understanding of syntactically and semantically ill-formed text

Computational Linguistics - Special issue on ill-formed input
Integrated control of chart items for error repair

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Towards a single proposal in spelling correction

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Spelling correction using context

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Improving data driven wordclass tagging by system combination

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces a robust, portable system for categorizing unknown words. It is based on a multi- component architecture where each component is responsible for identifying one class of unknown words. The focus of this paper is the component that identifies spelling errors. The misspelling identifier uses a decision tree architecture to combine multiple types of evidence about the unknown word. The misspelling identifier is evaluated using data from live closed captions - a gem-e replete with a wide variety of unknown words.