Forgetting Exceptions is Harmful in Language Learning
Machine Learning - Special issue on natural language learning
An empirical study of smoothing techniques for language modeling
ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Grammatical Inference: Learning Automata and Grammars
Grammatical Inference: Learning Automata and Grammars
Hi-index | 0.00 |
In this article, we propose the use of suffix arrays to efficiently implement n-gram language models with practically unlimited size n. This approach, which is used with synchronous back-off, allows us to distinguish between alternative sequences using large contexts. We also show that we can build this kind of models with additional information for each symbol, such as part-of-speech tags and dependency information. The approach can also be viewed as a collection of virtual k-testable automata. Once built, we can directly access the results of any k-testable automaton generated from the input training data. Synchronous backoff automatically identifies the k-testable automaton with the largest feasible k. We have used this approach in several classification tasks.