Squibs and discussions: the DOP Estimation method is biased and inconsistent

  • Authors:
  • Mark Johnson

  • Affiliations:
  • Brown University, Providence, RI

  • Venue:
  • Computational Linguistics
  • Year:
  • 2002

Quantified Score

Hi-index 0.01

Visualization

Abstract

A data-oriented parsing or DOP model for statistical parsing associates fragments of linguistic representations with numerical weights, where these weights are estimated by normalizing the empirical frequency of each fragment in a training corpus (see Bod [1998] and references cited therein). This note observes that this estimation method is biased and inconsistent; that is, the estimated distribution does not in general converge on the true distribution as the size of the training corpus increases.