A computational model for the automatic recognition of affect in speech

Authors:
Raul Fernandez;Rosalind W. Picard
Affiliations:
-;-
Venue:
A computational model for the automatic recognition of affect in speech
Year:
2004

Citing 0
Cited 11

Next-Generation Personal Memory Aids

BT Technology Journal
Emotive alert: HMM-based emotion detection in voicemail messages

Proceedings of the 10th international conference on Intelligent user interfaces
Natural behavior of a listening agent

Lecture Notes in Computer Science
Application of feature subset selection based on evolutionary algorithms for automatic emotion recognition in speech

NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Automatic inference of complex affective states

Computer Speech and Language
Opensmile: the munich versatile and fast open-source audio feature extractor

Proceedings of the international conference on Multimedia
Survey on speech emotion recognition: Features, classification schemes, and databases

Pattern Recognition
Emotion on the road: necessity, acceptance, and feasibility of affective computing in the car

Advances in Human-Computer Interaction - Special issue on emotion-aware natural interaction
Recognizing affect from speech prosody using hierarchical graphical models

Speech Communication
Candidacy of physiological measurements for implicit control of emotional speech synthesis

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
How's my mood and stress?: an efficient speech analysis library for unobtrusive monitoring on mobile phones

Proceedings of the 6th International Conference on Body Area Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spoken language, in addition to serving as a primary vehicle for externalizing linguistic structures and meaning, acts as a carrier of various sources of information, including background, age, gender, membership in social structures, as well as physiological, pathological and emotional states. These sources of information are more than just ancillary to the main purpose of linguistic communication: Humans react to the various non-linguistic factors encoded in the speech signal, shaping and adjusting their interactions to satisfy interpersonal and social protocols. Computer science, artificial intelligence and computational linguistics have devoted much active research to systems that aim to model the production and recovery of linguistic lexico-semantic structures from speech. However, less attention has been devoted to systems that model and understand the paralinguistic and extralinguistic information in the signal. As the breadth and nature of human-computer interaction escalates to levels previously reserved for human-to-human communication, there is a growing need to endow computational systems with human-like abilities which facilitate the interaction and make it more natural. Of paramount importance amongst these is the human ability to make inferences regarding the affective content of our exchanges. This thesis proposes a framework for the recognition of affective qualifiers from prosodic-acoustic parameters extracted from spoken language. It is argued that modeling the affective prosodic variation of speech can be approached by integrating acoustic parameters from various prosodic time scales, summarizing information from more localized (e.g., syllable level) to more global prosodic phenomena (e.g., utterance level). In this framework speech is structurally modeled as a dynamically evolving hierarchical model in which levels of the hierarchy are determined by prosodic constituency and contain parameters that evolve according to dynamical systems. The acoustic parameters have been chosen to reflect four main components of speech thought to reflect paralinguistic and affect-specific information: intonation, loudness, rhythm and voice quality. The thesis addresses the contribution of each of these components separately, and evaluates the full model by testing it on datasets of acted and of spontaneous speech perceptually annotated with affective labels, and by comparing it against human performance benchmarks. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)