Shallow parsing on the basis of words only: a case study

  • Authors:
  • Antal van den Bosch;Sabine Buchholz

  • Affiliations:
  • Tilburg University, Tilburg, The Netherlands;Tilburg University, Tilburg, The Netherlands

  • Venue:
  • ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a case study in which a memory-based learning algorithm is trained to simultaneously chunk sentences and assign grammatical function tags to these chunks. We compare the algorithm's performance on this parsing task with varying training set sizes (yielding learning curves) and different input representations. In particular we compare input consisting of words only, a variant that includes word form information for low-frequency words, gold-standard POS only, and combinations of these. The word-based shallow parser displays an apparently log-linear increase in performance, and surpasses the flatter POS-based curve at about 50,000 sentences of training data. The low-frequency variant performs even better, and the combinations is best. Comparative experiments with a real POS tagger produce lower results. We argue that we might not need an explicit intermediate POS-tagging step for parsing when a sufficient amount of training material is available and word form information is used for low-frequency words.