Morphological analysis can improve a CCG parser for English

Authors:
Matthew Honnibal;Jonathan K. Kummerfeld;James R. Curran
Affiliations:
University of Sydney;University of Sydney;University of Sydney
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 9
Cited 0

The syntactic process

The syntactic process
The combinatory morphemic lexicon

Computational Linguistics
Supertagging: an approach to almost parsing

Computational Linguistics
Multi-tagging for lexicalized-grammar parsing

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
The importance of supertagging for wide-coverage CCG parsing

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

Computational Linguistics
Wide-coverage efficient statistical parsing with ccg and log-linear models

Computational Linguistics
Feature forest models for probabilistic hpsg parsing

Computational Linguistics
Fully lexicalising CCGbank with hat categories

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because English is a low morphology language, current statistical parsers tend to ignore morphology and accept some level of redundancy. This paper investigates how costly such redundancy is for a lexicalised grammar such as CCG. We use morphological analysis to split verb inflectional suffixes into separate tokens, so that they can receive their own lexical categories. We find that this improves accuracy when the splits are based on correct POS tags, but that errors in gold standard or automatically assigned POS tags are costly for the system. This shows that the parser can benefit from morphological analysis, so long as the analysis is correct.