A dynamic tonal perception model for optimal pitch stylization

  • Authors:
  • A. Origlia;G. Abete;F. Cutugno

  • Affiliations:
  • LUSI-lab, Department of Physics, University of Naples "Federico II", Naples, Italy;Department of Modern Philology, University of Naples "Federico II", Naples, Italy;LUSI-lab, Department of Physics, University of Naples "Federico II", Naples, Italy

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic pitch stylization is an important resource for researchers working both on prosody and speech technologies. In order to be useful, the stylized F"0 curve should contain the fewest possible number of control points while remaining, at the same time, close to the original curve from a perceptual point of view. Here, a pitch stylization algorithm aimed at finding the optimal balance between the number of employed control points and perceptual equality with respect to the original curve is presented. Rather than being defined by means of statistical closeness to the original F"0 curve, the quality of the stylized curve is defined on the basis of a dynamic tonal perception model. The number of control points is optimized on the basis of previous results showing that the stylization can be more radical in those areas of the signal where tone perception is less accurate, i.e. in non-prominent areas. Perceptual tests show that, concerning the perceptual equality of the stylization, this approach performs as well as other reference ones, with the advantage of using a significantly lower number of control points. Although it is based on a theoretical background employing phonological units like syllables, the proposed, phonetic, approach does not require any preliminary segmentation or annotation step. It combines, instead, acoustic parameters related to syllabification and prominence detection into a single model which has been designed to be both integrated, in the sense that it does not introduce any pitfalls in the process, and dynamic, in the sense that it does not include rigid tonal perception thresholds.