Squibs and discussions: the DOP Estimation method is biased and inconsistent
Computational Linguistics
A computational model of language performance: Data Oriented Parsing
COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
What is the minimal set of fragments that achieves maximal parse accuracy?
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Hi-index | 0.00 |
Data Oriented Parsing is a natural language processing model that analyses new input based on past experience. The underlying idea is to extract a set of fragment-probability pairs from a given treebank and use these concrete experiences to construct new utterance analyses. Initially, probabilities were based on the fragments' relative frequency of occurrence. This estimator, however, was soon shown to be biased towards large corpus trees [8] and inconsistent [10]. To alleviate the effects of bias on performance a set of heuristic constraints was put in force. Other estimators addressing these issues have since then been proposed. This paper seeks to show that the most commonly used DOP estimators are in fact susceptible to strong size-sensitive bias effects and to present a new estimation algorithm that greatly reduces these effects of bias on performance without complicating the estimation process.