You talking to me?: a predictive model for zero auxiliary constructions

  • Authors:
  • Andrew Caines;Paula Buttery

  • Affiliations:
  • University of Cambridge, UK;University of Cambridge, UK

  • Venue:
  • NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As a consequence of the established practice to prefer training data obtained from written sources, NLP tools encounter problems in handling data from the spoken domain. However, accurate models of spoken data are increasingly in demand for naturalistic speech generation and machine translations in speech-like contexts (such as chat windows and SMS). There is a widely held assumption in the linguistic field that spoken language is an impoverished form of written language. However, we show that spoken data is not unpredictably irregular and that language models can benefit from detailed consideration of spoken language features. This paper considers one specific construction which is largely restricted to the spoken domain - the ZERO AUXILIARY - and makes a predictive model of that construction for native speakers of British English. The model can predict zero auxiliary occurrence in the BNC with 96.9% accuracy. We will demonstrate how this model can be integrated into existing parsing tools, increasing the number of successful parses for this zero auxiliary construction by around 30%, and thus improving the performance of NLP applications which rely on parsing.