VoIP speech quality estimation in a mixed context with genetic programming

  • Authors:
  • Adil Raja;R. Muhammad Atif Azad;Colin Flanagan;Conor Ryan

  • Affiliations:
  • University of Limerick, Limerick, Ireland;University of Limerick, Limerick, Ireland;University of Limerick, Limerick, Ireland;University of Limerick, Limerick, Ireland

  • Venue:
  • Proceedings of the 10th annual conference on Genetic and evolutionary computation
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Voice over IP (VoIP) speech quality estimation is crucial to providing optimal Quality of Service (QoS). This paper seeks to provide improved speech quality estimation models with better prediction accuracy by considering a richer set of input features than the current International Telecommunications Union-Telecommunication (ITU-T) recommendations. It addresses a transitional phase, where wideband (WB) networks are becoming available. However, they have to co-exist with the existing narrowband (NB) setups for the time being. Quality estimation becomes a challenge in such a mixed context. The ITU-T recommendation (termed E-Model) has recently been extended to deal with the mixed context. However, it evaluates the speech degradation in the WB scenario based solely on codec related distortions (only a subset of factors affecting the speech quality on a VoIP network). The extension is derived out of speech signals evaluated by human subjects: an expensive and difficult to reproduce exercise. This paper innovates by considering a number of other network distortion types as well to produce generalised models that predict the quality degradation to a higher accuracy. To this end, an extensive set of speech samples is subjected to a wide variety of distortions. The degraded signals are evaluated by the currently best available algorithmic approximation of human evaluation of speech to produce quality scores. Using the distortions as the input features and targeting the quality scores, we employ Genetic Programming to produce parsimonious models that show considerable prediction gain compared to the E-Model. As against some existing approaches, where the models are tailored to various telephony codecs, the evolved models generalise across a variety of modern codecs.