Time-Frequency features extraction for infant directed speech discrimination

  • Authors:
  • Ammar Mahdhaoui;Mohamed Chetouani;Loic Kessous

  • Affiliations:
  • UPMC Univ Paris 06, F-75005, Paris, France CNRS, UMR 7222 ISIR, Institut des Systèmes Intelligents et de Robotique, F-75005, Paris, France;UPMC Univ Paris 06, F-75005, Paris, France CNRS, UMR 7222 ISIR, Institut des Systèmes Intelligents et de Robotique, F-75005, Paris, France;UPMC Univ Paris 06, F-75005, Paris, France CNRS, UMR 7222 ISIR, Institut des Systèmes Intelligents et de Robotique, F-75005, Paris, France

  • Venue:
  • NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we evaluate the relevance of a perceptual spectral model for automatic motherese detection. We investigated various classification techniques (Gaussian Mixture Models, Support Vector Machines, Neural network, k-nearest neighbors) often used in emotion recognition. Classification experiments were carried out with short manually pre-segmented speech and motherese segments extracted from family home movies (with a mean duration of approximately 3s). Accuracy of around 86% were obtained when tested on speaker-independent speech data and 87.5% in the last study with speaker-dependent. We found that GMM trained with spectral feature MFCC gives the best score since it outperforms all the single classifiers. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between motherese and normal-directed speech (around 86% accuracy).