Aspects of pattern-matching in Data-Oriented Parsing

  • Authors:
  • Guy De Pauw

  • Affiliations:
  • CNTS, University of Antwerp

  • Venue:
  • COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data-Oriented Parsing (Dop) ranks among the best parsing schemes, pairing state-of-the art parsing accuracy to the psycholinguistic insight that larger chunks of syntactic structures are relevant grammatical and probabilistic units. Parsing with the DOP-model, however, seems to involve a lot of CPU cycles and a considerable amount of double work, brought on by the concept of multiple derivations, which is necessary for probabilistic processing, but which is not convincingly related to a proper linguistic backbone. It is however possible to reinterpret the DOP-model as a pattern-matching model, which tries to maximize the size of the substructures that construct the parse, rather than the probability of the parse. By emphasizing this memory-based aspect of the DOP-model, it is possible to do away with multiple derivations, opening up possibilities for efficient Viterbistyle optimizations, while still retaining acceptable parsing accuracy through enhanced context-sensitivity.