Multiword expressions in spoken language: An exploratory study on pronunciation variation

  • Authors:
  • Diana Binnenpoorte;Catia Cucchiarini;Lou Boves;Helmer Strik

  • Affiliations:
  • Department of Linguistics, Radboud University Nijmegen, Erasmusplein 1, Nijmegen 6525 HT, The Netherlands;Department of Linguistics, Radboud University Nijmegen, Erasmusplein 1, Nijmegen 6525 HT, The Netherlands;Department of Linguistics, Radboud University Nijmegen, Erasmusplein 1, Nijmegen 6525 HT, The Netherlands;Department of Linguistics, Radboud University Nijmegen, Erasmusplein 1, Nijmegen 6525 HT, The Netherlands

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The study presented in this paper was aimed at exploring the possibilities of modelling specific pronunciation characteristics of multiword expressions (MWEs) for both automatic speech recognition (ASR) and automatic phonetic transcription (APT). For this purpose, we first drew up an inventory of frequently found N-grams extracted from orthographic transcriptions of spontaneous speech contained in a large corpus of spoken Dutch. These N-grams were filtered and subsequently assigned to linguistic categories. For a small selection of these N-grams we examined the phonetic transcriptions contained in the corpus. We found that the pronunciation of these N-grams differed to a large extent from the canonical form. In order to determine whether this is a general characteristic of spontaneous speech or rather the effect of the specific status of these N-grams, we analysed the pronunciations of the individual words composing the N-grams in two context conditions: (1) in the N-gram context and (2) in any other context. We found that words in N-grams do indeed have peculiar pronunciation patterns. This seems to suggest that the N-grams investigated may be considered as MWEs that should be treated as lexical entries in the pronunciation lexicons used in ASR and APT, with their own specific pronunciation variants.