Reducing Bias Effects in DOP Parameter Estimation

  • Authors:
  • Evita Linardaki

  • Affiliations:
  • Hellenic Open University, Greece, email: elinardaki@gmail.com

  • Venue:
  • Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data Oriented Parsing is a natural language processing model that analyses new input based on past experience. The underlying idea is to extract a set of fragment-probability pairs from a given treebank and use these concrete experiences to construct new utterance analyses. Initially, probabilities were based on the fragments' relative frequency of occurrence. This estimator, however, was soon shown to be biased towards large corpus trees [8] and inconsistent [10]. To alleviate the effects of bias on performance a set of heuristic constraints was put in force. Other estimators addressing these issues have since then been proposed. This paper seeks to show that the most commonly used DOP estimators are in fact susceptible to strong size-sensitive bias effects and to present a new estimation algorithm that greatly reduces these effects of bias on performance without complicating the estimation process.