Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications

  • Authors:
  • Hakki Gökhan Ilk;Saadettin Güler

  • Affiliations:
  • Ankara University, Department of Electronics Engineering, Besevler, Ankara, Turkey;Ankara University, Department of Electronics Engineering, Besevler, Ankara, Turkey

  • Venue:
  • Signal Processing
  • Year:
  • 2006

Quantified Score

Hi-index 0.09

Visualization

Abstract

This paper proposes an alternative scheme to variable bit rate (VBR) speech coding for voice over Internet protocol (VoIP) during network congestion in Internet. The proposed scheme is called "adaptive bit rate switching" and ensures that the available bandwidth is most efficiently used. When congestion is signaled, a time scale modification algorithm called WSOLA (waveform similarity overlap and add) with time-dependent compression rate, determined according to the severity of the network congestion, is employed in order to reduce the bit rate required to transmit speech adaptively. This approach is different from VBR speech coding and novel in the sense that the coder operates at any desired bit rate for any desired duration. This is particularly useful in network environments because load may be different at each direction.WSOLA algorithm has been selected as the time scale modification algorithm because it is computationally efficient and produces high quality output. In addition, the proposed scheme integrates WSOLA, or any time scale modification algorithm into any commercial or military constant bit rate (CBR) or VBR codec without any modification in the vocoder structure. The results of the proposed method are statistically evaluated by using diagnostics rhyme tests (DRT) and mean opinion score (MOS) tests. The DRT results obtained from the simulation of the proposed system revealed, under 90% confidence interval, that the perceptual success of the adaptively compressed and G.729 coded speech is 98.92±0.03 percent. The MOS test results, on the other hand, proved that the system provides better perceptual quality than the standard time scale modification, indicating that the proposed system indeed provides graceful degradation in voice quality even in additive increase multiplicative decrease modeled channels, provided that the dynamic network conditions grant bandwidth.