Reducing Bias Effects in DOP Parameter Estimation

Authors:
Evita Linardaki
Affiliations:
Hellenic Open University, Greece, email: elinardaki@gmail.com
Venue:
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Year:
2008

Citing 4
Cited 0

Squibs and discussions: the DOP Estimation method is biased and inconsistent

Computational Linguistics
A computational model of language performance: Data Oriented Parsing

COLING '92 Proceedings of the 14th conference on Computational linguistics - Volume 3
What is the minimal set of fragments that achieves maximal parse accuracy?

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
An all-subtrees approach to unsupervised parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data Oriented Parsing is a natural language processing model that analyses new input based on past experience. The underlying idea is to extract a set of fragment-probability pairs from a given treebank and use these concrete experiences to construct new utterance analyses. Initially, probabilities were based on the fragments' relative frequency of occurrence. This estimator, however, was soon shown to be biased towards large corpus trees [8] and inconsistent [10]. To alleviate the effects of bias on performance a set of heuristic constraints was put in force. Other estimators addressing these issues have since then been proposed. This paper seeks to show that the most commonly used DOP estimators are in fact susceptible to strong size-sensitive bias effects and to present a new estimation algorithm that greatly reduces these effects of bias on performance without complicating the estimation process.