A dynamic tonal perception model for optimal pitch stylization

Authors:
A. Origlia;G. Abete;F. Cutugno
Affiliations:
LUSI-lab, Department of Physics, University of Naples "Federico II", Naples, Italy;Department of Modern Philology, University of Naples "Federico II", Naples, Italy;LUSI-lab, Department of Physics, University of Naples "Federico II", Naples, Italy
Venue:
Computer Speech and Language
Year:
2013

Citing 2
Cited 2

Modeling naturalistic affective states via facial and vocal expressions recognition

Proceedings of the 8th international conference on Multimodal interfaces
Adaptive on-line neural network retraining for real life multimodal emotion recognition

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I

CoWME: a general framework to evaluate cognitive workload during multimodal interaction

Proceedings of the 15th ACM on International conference on multimodal interaction
Continuous emotion recognition with phonetic syllables

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic pitch stylization is an important resource for researchers working both on prosody and speech technologies. In order to be useful, the stylized F"0 curve should contain the fewest possible number of control points while remaining, at the same time, close to the original curve from a perceptual point of view. Here, a pitch stylization algorithm aimed at finding the optimal balance between the number of employed control points and perceptual equality with respect to the original curve is presented. Rather than being defined by means of statistical closeness to the original F"0 curve, the quality of the stylized curve is defined on the basis of a dynamic tonal perception model. The number of control points is optimized on the basis of previous results showing that the stylization can be more radical in those areas of the signal where tone perception is less accurate, i.e. in non-prominent areas. Perceptual tests show that, concerning the perceptual equality of the stylization, this approach performs as well as other reference ones, with the advantage of using a significantly lower number of control points. Although it is based on a theoretical background employing phonological units like syllables, the proposed, phonetic, approach does not require any preliminary segmentation or annotation step. It combines, instead, acoustic parameters related to syllabification and prominence detection into a single model which has been designed to be both integrated, in the sense that it does not introduce any pitfalls in the process, and dynamic, in the sense that it does not include rigid tonal perception thresholds.