Poor estimates of context are worse than none

Authors:
William A. Gale;Kenneth W. Church
Affiliations:
-;-
Venue:
HLT '90 Proceedings of the workshop on Speech and Natural Language
Year:
1990

Citing 0
Cited 23

Parsing the voyager domain using pearl

HLT '91 Proceedings of the workshop on Speech and Natural Language
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
A Review of Statistical Language Processing Techniques

Artificial Intelligence Review
On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey

IEEE Transactions on Pattern Analysis and Machine Intelligence
Generalized probabilistic LR parsing of natural language (Corpora) with unification-based grammars

Computational Linguistics - Special issue on using large corpora: I
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Dedication to William A. Gale

Natural Language Engineering
Review of "Statistical language learning" by Eugene Charniak. The MIT Press 1993.

Computational Linguistics
Pearl: a probabilistic chart parser

EACL '91 Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics
Document classification using a finite mixture model

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Towards history-based grammars: using richer models for probabilistic parsing

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Quantifying lexical influence: giving direction to context

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Efficiency, robustness and accuracy in Picky chart parsing

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Acquisition of selectional patterns

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 2
Clustering words with the MDL principle

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Context-based spelling correction for Japanese OCR

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 2
Data extraction as text categorization: an experiment with the MUC-3 corpus

MUC3 '91 Proceedings of the 3rd conference on Message understanding
Probabilistic prediction and Picky chart parsing

HLT '91 Proceedings of the workshop on Speech and Natural Language
Towards history-based grammars: using richer models for probabilistic parsing

HLT '91 Proceedings of the workshop on Speech and Natural Language
Modeling of long distance context dependency in Chinese

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Modeling of long distance context dependency

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Using target-language information to train part-of-speech taggers for machine translation

Machine Translation
Speeding up target-language driven part-of-speech tagger training for machine translation

MICAI'06 Proceedings of the 5th Mexican international conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is difficult to estimate the probability of a word's context because of sparse data problems. If appropriate care is taken, we find that it is possible to make useful estimates of contextual probabilities that improve performance in a spelling correction application. In contrast, less careful estimates are found to be useless. Specifically, we will show that the Good-Turing method makes the use of contextual information practical for a spelling corrector, while attempts to use the maximum likelihood estimator (MLE) or expected likelihood estimator (ELE) fail. Spelling correction was selected as an application domain because it is analogous to many important recognition applications based on a noisy channel model (such as speech recognition), though somewhat simpler and therefore possibly more amenable to detailed statistical analysis.