Appropriately handled prosodic breaks help PCFG parsing

Authors:
Zhongqiang Huang;Mary Harper
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD and Johns Hopkins University, Baltimore, MD
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 5
Cited 1

A maximum-entropy-inspired parser

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Probabilistic CFG with latent annotations

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Using conditional random fields for sentence boundary detection in speech

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Effective use of prosody in parsing conversational speech

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Self-training PCFG grammars with latent annotations across languages

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2

Lessons learned in part-of-speech tagging of conversational speech

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates using prosodic information in the form of ToBI break indexes for parsing spontaneous speech. We revisit two previously studied approaches, one that hurt parsing performance and one that achieved minor improvements, and propose a new method that aims to better integrate prosodic breaks into parsing. Although these approaches can improve the performance of basic probabilistic context free grammar (PCFG) parsers, they all fail to produce fine-grained PCFG models with latent annotations (PCFG-LA) (Matsuzaki et al., 2005; Petrov and Klein, 2007) that perform significantly better than the baseline PCFG-LA model that does not use break indexes, partially due to mis-alignments between automatic prosodic breaks and true phrase boundaries. We propose two alternative ways to restrict the search space of the prosodically enriched parser models to the n-best parses from the baseline PCFG-LA parser to avoid egregious parses caused by incorrect breaks. Our experiments show that all of the prosodically enriched parser models can then achieve significant improvement over the baseline PCFG-LA parser.