Opensmile: the munich versatile and fast open-source audio feature extractor

Authors:
Florian Eyben;Martin Wöllmer;Björn Schuller
Affiliations:
Technische Universität München, München, Germany;Technische Universität München, München, Germany;Technische Universität München, München, Germany
Venue:
Proceedings of the international conference on Multimedia
Year:
2010

Citing 2
Cited 46

A computational model for the automatic recognition of affect in speech

A computational model for the automatic recognition of affect in speech
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Turn-taking cues in a human tutoring corpus

HLT-SS '11 Proceedings of the ACL 2011 Student Session
Robust multi-stream keyword and non-linguistic vocalization detection for computationally intelligent virtual agents

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

Speech Communication
AVEC 2011-the first international audio/visual emotion challenge

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Investigating glottal parameters and teager energy operators in emotion recognition

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Quality of experience evaluation of voice communication systems using affect-based approach

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Produce. annotate. archive. repurpose --: accelerating the composition and metadata accumulation of tv content

AIEMPro '11 Proceedings of the 2011 ACM international workshop on Automated media analysis and production for novel TV services
A multitask approach to continuous five-dimensional affect sensing in natural speech

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Affective Interaction in Natural Environments
Real-Time speech separation by semi-supervised nonnegative matrix factorization

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Dominance detection in a reverberated acoustic scenario

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Paralinguistics in speech and language-State-of-the-art and the challenge

Computer Speech and Language
Speaker state recognition using an HMM-based feature extraction method

Computer Speech and Language
Intrinsic and extrinsic evaluation of an automatic user disengagement detector for an uncertainty-adaptive spoken dialogue system

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
AVEC 2012: the continuous audio/visual emotion challenge - an introduction

Proceedings of the 14th ACM international conference on Multimodal interaction
AVEC 2012: the continuous audio/visual emotion challenge

Proceedings of the 14th ACM international conference on Multimodal interaction
Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering

Proceedings of the 14th ACM international conference on Multimodal interaction
Improving generalisation and robustness of acoustic affect recognition

Proceedings of the 14th ACM international conference on Multimodal interaction
Preserving actual dynamic trend of emotion in dimensional speech emotion recognition

Proceedings of the 14th ACM international conference on Multimodal interaction
Toward automating a human behavioral coding system for married couples' interactions using speech acoustic features

Speech Communication
Bayesian changepoint detection for the automatic assessment of fluency and articulatory disorders

Speech Communication
Ten recent trends in computational paralinguistics

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

Image and Vision Computing
Categorical and dimensional affect analysis in continuous input: Current trends and future directions

Image and Vision Computing
Laugh-aware virtual agent and its impact on user amusement

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Speech stress assessment using physiological and psychological measures

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication
The social signal interpretation (SSI) framework: multimodal signal processing and recognition in real-time

Proceedings of the 21st ACM international conference on Multimedia
Recent developments in openSMILE, the munich open-source multimedia feature extractor

Proceedings of the 21st ACM international conference on Multimedia
ESSENTIA: an open-source library for sound and music analysis

Proceedings of the 21st ACM international conference on Multimedia
AVEC 2013: the continuous audio/visual emotion and depression recognition challenge

Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge
Diagnosis of depression by behavioural signals: a multimodal approach

Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge
Nonlinear dynamic analysis of pathological voices

ICIC'13 Proceedings of the 9th international conference on Intelligent Computing Theories and Technology
Partial least squares regression on grassmannian manifold for emotion recognition

Proceedings of the 15th ACM on International conference on multimodal interaction
Emotion recognition in the wild challenge 2013

Proceedings of the 15th ACM on International conference on multimodal interaction
Emotion recognition with boosted tree classifiers

Proceedings of the 15th ACM on International conference on multimodal interaction
Emotion recognition in the wild challenge (EmotiW) challenge and workshop summary

Proceedings of the 15th ACM on International conference on multimodal interaction
The acoustics of eye contact: detecting visual attention from conversational audio cues

Proceedings of the 6th workshop on Eye gaze in intelligent human machine interaction: gaze in multimodal interaction
A comparative study on automatic audio-visual fusion for aggression detection using meta-information

Pattern Recognition Letters
Hierarchical emotion classification using genetic algorithms

Proceedings of the Fourth Symposium on Information and Communication Technology
Compensating for speaker or lexical variabilities in speech for emotion recognition

Speech Communication
Medium-term speaker states-A review on intoxication, sleepiness and the first challenge

Computer Speech and Language
Intoxicated speech detection: A fusion framework with speaker-normalized hierarchical functionals and GMM supervectors

Computer Speech and Language
Speaker state classification based on fusion of asymmetric simple partial least squares (SIMPLS) and support vector machines

Computer Speech and Language
Vocal fatigue induced by prolonged oral reading: Analysis and detection

Computer Speech and Language
Computing vocal entrainment: A signal-derived PCA-based quantification scheme with application to affect analysis in married couple interactions

Computer Speech and Language
The TUM Gait from Audio, Image and Depth (GAID) database: Multimodal recognition of subjects and traits

Journal of Visual Communication and Image Representation
Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce the openSMILE feature extraction toolkit, which unites feature extraction algorithms from the speech processing and the Music Information Retrieval communities. Audio low-level descriptors such as CHROMA and CENS features, loudness, Mel-frequency cepstral coefficients, perceptual linear predictive cepstral coefficients, linear predictive coefficients, line spectral frequencies, fundamental frequency, and formant frequencies are supported. Delta regression and various statistical functionals can be applied to the low-level descriptors. openSMILE is implemented in C++ with no third-party dependencies for the core functionality. It is fast, runs on Unix and Windows platforms, and has a modular, component based architecture which makes extensions via plug-ins easy. It supports on-line incremental processing for all implemented features as well as off-line and batch processing. Numeric compatibility with future versions is ensured by means of unit tests. openSMILE can be downloaded from http://opensmile.sourceforge.net/.