Time-Frequency features extraction for infant directed speech discrimination

Authors:
Ammar Mahdhaoui;Mohamed Chetouani;Loic Kessous
Affiliations:
UPMC Univ Paris 06, F-75005, Paris, France CNRS, UMR 7222 ISIR, Institut des Systèmes Intelligents et de Robotique, F-75005, Paris, France;UPMC Univ Paris 06, F-75005, Paris, France CNRS, UMR 7222 ISIR, Institut des Systèmes Intelligents et de Robotique, F-75005, Paris, France;UPMC Univ Paris 06, F-75005, Paris, France CNRS, UMR 7222 ISIR, Institut des Systèmes Intelligents et de Robotique, F-75005, Paris, France
Venue:
NOLISP'09 Proceedings of the 2009 international conference on Advances in Nonlinear Speech Processing
Year:
2009

Citing 7
Cited 2

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
The nature of statistical learning theory

The nature of statistical learning theory
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Psychoacoustics: Facts and Models

Psychoacoustics: Facts and Models
Automatic Motherese Detection for Face-to-Face Interaction Analysis

Multimodal Signals: Cognitive and Algorithmic Issues
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Paralinguistics in speech and language-State-of-the-art and the challenge

Computer Speech and Language
Ten recent trends in computational paralinguistics

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we evaluate the relevance of a perceptual spectral model for automatic motherese detection. We investigated various classification techniques (Gaussian Mixture Models, Support Vector Machines, Neural network, k-nearest neighbors) often used in emotion recognition. Classification experiments were carried out with short manually pre-segmented speech and motherese segments extracted from family home movies (with a mean duration of approximately 3s). Accuracy of around 86% were obtained when tested on speaker-independent speech data and 87.5% in the last study with speaker-dependent. We found that GMM trained with spectral feature MFCC gives the best score since it outperforms all the single classifiers. We also found that a fusion between classifiers that use spectral features and classifiers that use prosodic information usually increases the performance for discrimination between motherese and normal-directed speech (around 86% accuracy).