Shallow parsing on the basis of words only: a case study

Authors:
Antal van den Bosch;Sabine Buchholz
Affiliations:
Tilburg University, Tilburg, The Netherlands;Tilburg University, Tilburg, The Netherlands
Venue:
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Year:
2002

Citing 18
Cited 4

Toward memory-based reasoning

Communications of the ACM - Special issue on parallelism
Information-based syntax and semantics: Vol. 1: fundamentals

Information-based syntax and semantics: Vol. 1: fundamentals
Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
A corpus-based approach to language learning

A corpus-based approach to language learning
Forgetting Exceptions is Harmful in Language Learning

Machine Learning - Special issue on natural language learning
Information Retrieval

Information Retrieval
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
A stochastic parts program and noun phrase parser for unrestricted text

ANLC '88 Proceedings of the second conference on Applied natural language processing
A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
A memory-based approach to learning shallow natural language patterns

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
A new statistical parser based on bigram lexical dependencies

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Three new probabilistic models for dependency parsing: an exploration

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Scaling to very very large corpora for natural language disambiguation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Introduction to the CoNLL-2000 shared task: chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Single-classifier memory-based phrase chunking

ConLL '00 Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning - Volume 7
Exploring evidence for shallow parsing

ConLL '01 Proceedings of the 2001 workshop on Computational Natural Language Learning - Volume 7

Preposition semantic classification via Penn Treebank and FrameNet

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A mission for computational natural language learning

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Unsupervised Part-of-Speech Tagging in the Large

Research on Language and Computation
New meta-grammar constructs in czech language parser synt

TSD'05 Proceedings of the 8th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a case study in which a memory-based learning algorithm is trained to simultaneously chunk sentences and assign grammatical function tags to these chunks. We compare the algorithm's performance on this parsing task with varying training set sizes (yielding learning curves) and different input representations. In particular we compare input consisting of words only, a variant that includes word form information for low-frequency words, gold-standard POS only, and combinations of these. The word-based shallow parser displays an apparently log-linear increase in performance, and surpasses the flatter POS-based curve at about 50,000 sentences of training data. The low-frequency variant performs even better, and the combinations is best. Comparative experiments with a real POS tagger produce lower results. We argue that we might not need an explicit intermediate POS-tagging step for parsing when a sufficient amount of training material is available and word form information is used for low-frequency words.