All-word prediction as the ultimate confusable disambiguation

Authors:
Antal van den Bosch
Affiliations:
Tilburg University, LE Tilburg, The Netherlands
Venue:
CHSLP '06 Proceedings of the Workshop on Computationally Hard Problems and Joint Inference in Speech and Language Processing
Year:
2006

Citing 13
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
IGTree: Using Trees for Compression and Classification in Lazy LearningAlgorithms

Artificial Intelligence Review - Special issue on lazy learning
Statistical methods for speech recognition

Statistical methods for speech recognition
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
A Winnow-Based Approach to Context-Sensitive Spelling Correction

Machine Learning - Special issue on natural language learning
Large scale experiments on correction of confused words

ACSC '01 Proceedings of the 24th Australasian conference on Computer science
Automatic Rule Acquisition for Spelling Correction

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A classification approach to word prediction

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Memory-based learning: using similarity for smoothing

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Decision lists for lexical ambiguity resolution: application to accent restoration in Spanish and French

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics

Memory-based text correction for preposition and determiner errors

Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a classification-based word prediction model based on IGTree, a decision-tree induction algorithm with favorable scaling abilities and a functional equivalence to n-gram models with back-off smoothing. Through a first series of experiments, in which we train on Reuters newswire text and test either on the same type of data or on general or fictional text, we demonstrate that the system exhibits log-linear increases in prediction accuracy with increasing numbers of training examples. Trained on 30 million words of newswire text, prediction accuracies range between 12.6% on fictional text and 42.2% on newswire text. In a second series of experiments we compare all-words prediction with confusable prediction, i.e., the same task, but specialized to predicting among limited sets of words. Confusable prediction yields high accuracies on nine example confusable sets in all genres of text. The confusable approach outperforms the all-words-prediction approach, but with more data the difference decreases.