Noise robust F0 determination and epoch-marking algorithms

  • Authors:
  • Bojan Kotnik;Harald Höge;Zdravko Kačič

  • Affiliations:
  • ULTRA d.o.o., Research Center Maribor, Gosposvetska cesta 84, SI-2000 Maribor, Slovenia;Siemens AG, Corporate Technology, Professional Speech Processing, CT IC 5, Otto-Hahn-Ring 6, 81739 München, Germany;University of Maribor, Faculty of Electrical Engineering and Computer Science, Smetanova ul. 17, SI-2000 Maribor, Slovenia

  • Venue:
  • Signal Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.08

Visualization

Abstract

This paper presents a combined pitch frequency (F0) determination and epoch (pitch period) marking procedure CPDMA using merged normalized forward-backward correlation. The algorithm consists of several processing steps: preprocessing of the input speech signal, voicing detection using artificial neural networks, F0 determination stage based on normalized correlation, F0 contour postprocessing applying partial Viterbi traceback, and finally, epoch (or pitch period) marking. To evaluate the proposed CPDMA procedure against any other algorithm, a manually segmented PDA/PMA reference database based on real-life SPEECON Spanish speech database has been created. A set of criteria was proposed to objectively and compactly evaluate the performance of any evaluated PDA/PMA or voicing detection algorithm. The performance of the proposed CPDMA was compared with the performance of well-known and publicly available PRAAT toolkit. The PDA and PMA performances achieved with the proposed CPDMA algorithm significantly outperformed the performance of the PRAAT toolkit in all its three considered configurations: autocorrelation method (PRAAT_AC), cross-correlation method (PRAAT_CC), SHS (PRAAT_SHS), and point process (PRAAT_PP). The superior noise robustness of CPDMA is achieved at the expense of a more complex algorithm and consequently leads to worse real time factor when compared to PRAAT.