Turkish speech corpora and recognition tools developed by porting SONIC: Towards multilingual speech recognition

  • Authors:
  • Özgül Salor;Bryan L. Pellom;Tolga Ciloglu;Mübeccel Demirekler

  • Affiliations:
  • Department of Electrical and Electronics Engineering, Middle East Technical University, 06531 Ankara, Turkey;The Center for Spoken Language Research, University of Colorado at Boulder, Boulder, CO 80309, USA;Department of Electrical and Electronics Engineering, Middle East Technical University, 06531 Ankara, Turkey;Department of Electrical and Electronics Engineering, Middle East Technical University, 06531 Ankara, Turkey

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents work on developing speech corpora and recognition tools for Turkish by porting SONIC, a speech recognition tool developed initially for English at the Center for Spoken Language Research of the University of Colorado at Boulder. The work presented in this paper had two objectives: The first one is to collect a standard phonetically-balanced Turkish microphone speech corpus for general research use. A 193-speaker triphone-balanced audio corpus and a pronunciation lexicon for Turkish have been developed. The corpus has been accepted for distribution by the Linguistic Data Consortium (LDC) of the University of Pennsylvania in October 2005, and it will serve as a standard corpus for Turkish speech researchers. The second objective was to develop speech recognition tools (a phonetic aligner and a phone recognizer) for Turkish, which provided a starting point for obtaining a multilingual speech recognizer by porting SONIC to Turkish. This part of the work was the first port of this particular recognizer to a language other than English; subsequently, SONIC has been ported to over 15 languages. Using the phonetic aligner developed, the audio corpus has been provided with word, phone and HMM-state level alignments. For the phonetic aligner, it is shown that 92.6% of the automatically labeled phone boundaries are placed within 20ms of manually labeled locations for the Turkish audio corpus. Finally, a phone recognition error rate of 29.2% is demonstrated for the phone recognizer.