Aspects of pattern-matching in Data-Oriented Parsing

Authors:
Guy De Pauw
Affiliations:
CNTS, University of Antwerp
Venue:
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Year:
2000

Citing 2
Cited 4

Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Improving data driven wordclass tagging by system combination

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1

Do all fragments count?

Natural Language Engineering
A unified model of structural organization in language and music

Journal of Artificial Intelligence Research
GRAEL: an agent-based evolutionary computing approach for natural language grammar development

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Evolutionary computing as a tool for grammar development

GECCO'03 Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartI

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data-Oriented Parsing (Dop) ranks among the best parsing schemes, pairing state-of-the art parsing accuracy to the psycholinguistic insight that larger chunks of syntactic structures are relevant grammatical and probabilistic units. Parsing with the DOP-model, however, seems to involve a lot of CPU cycles and a considerable amount of double work, brought on by the concept of multiple derivations, which is necessary for probabilistic processing, but which is not convincingly related to a proper linguistic backbone. It is however possible to reinterpret the DOP-model as a pattern-matching model, which tries to maximize the size of the substructures that construct the parse, rather than the probability of the parse. By emphasizing this memory-based aspect of the DOP-model, it is possible to do away with multiple derivations, opening up possibilities for efficient Viterbistyle optimizations, while still retaining acceptable parsing accuracy through enhanced context-sensitivity.