Pronunciation and writing variants in an under-resourced language: the case of luxembourgish mobile n-deletion

  • Authors:
  • Natalie D. Snoeren;Martine Adda-Decker;Gilles Adda

  • Affiliations:
  • LIMSI-CNRS, Orsay, France;LIMSI-CNRS, Orsay, France;LIMSI-CNRS, Orsay, France

  • Venue:
  • LTC'09 Proceedings of the 4th conference on Human language technology: challenges for computer science and linguistics
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The national language of the Grand-Duchy of Luxembourg, Luxembourgish, has often been characterized as one of Europe's underdescribed and under-resourced languages. Because of a limited written production of Luxembourgish, poorly observed writing standardization (as compared to other languages such as English and French) and a large diversity of spoken varieties, the study of Luxembourgish poses many interesting challenges to automatic speech processing studies as well as to linguistic enquiries. In the present paper, we make use of large corpora to focus on typical writing and derived pronunciation variants in Luxembourgish, elicited by mobile -n deletion (hereafter shortened to MND). Using transcriptions from the House of Parliament debates and 10k words from news reports, we examine the reality of MND variants in written transcripts of speech. The goal of this study is manyfold: quantify the potential of variation due to MND in written Luxembourgish, check the mandatory status of the MND rule and discuss the arising problems for automatic spoken Luxembourgish processing.