Adaptive time scale modification of speech for graceful degrading voice quality in congested networks for VoIP applications

Authors:
Hakki Gökhan Ilk;Saadettin Güler
Affiliations:
Ankara University, Department of Electronics Engineering, Besevler, Ankara, Turkey;Ankara University, Department of Electronics Engineering, Besevler, Ankara, Turkey
Venue:
Signal Processing
Year:
2006

Citing 7
Cited 6

Effects of speaking rate and word frequency on pronunciations in conversational speech

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Overlap-add methods for time-scaling of speech

Speech Communication
Hybrid multi-mode/multi-rate CS-ACELP speech coding for adaptive voice over IP

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 200. on IEEE International Conference - Volume 02
Adaptive playout scheduling using time-scale modification in packet voice communications

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 03
Adaptive delay concealment for Internet voice applications with packet based time-scale modification

ICASSP '01 Proceedings of the Acoustics, Speech, and Signal Processing, 2001. on IEEE International Conference - Volume 03
Channel and source considerations of a bit-rate reduction technique for a possible wireless communications system's performance enhancement

IEEE Transactions on Wireless Communications
A simulation study of adaptive voice communications on IP networks

Computer Communications

Non-intrusive single-ended speech quality assessment in VoIP

Speech Communication
Enhancing VoIP service for ubiquitous communication in a campus WLAN with partial coverage

Computer Networks: The International Journal of Computer and Telecommunications Networking
A low complexity time-scaling expansion algorithm of speech signals suitable for real time implementation

Digital Signal Processing
Short Communication: Duration modification using glottal closure instants and vowel onset points

Speech Communication
Signal transformation and interpolation based on modified DCT synthesis

Digital Signal Processing
Non-uniform time scale modification using instants of significant excitation and vowel onset points

Speech Communication

Quantified Score

Hi-index	0.09

Visualization

Abstract

This paper proposes an alternative scheme to variable bit rate (VBR) speech coding for voice over Internet protocol (VoIP) during network congestion in Internet. The proposed scheme is called "adaptive bit rate switching" and ensures that the available bandwidth is most efficiently used. When congestion is signaled, a time scale modification algorithm called WSOLA (waveform similarity overlap and add) with time-dependent compression rate, determined according to the severity of the network congestion, is employed in order to reduce the bit rate required to transmit speech adaptively. This approach is different from VBR speech coding and novel in the sense that the coder operates at any desired bit rate for any desired duration. This is particularly useful in network environments because load may be different at each direction.WSOLA algorithm has been selected as the time scale modification algorithm because it is computationally efficient and produces high quality output. In addition, the proposed scheme integrates WSOLA, or any time scale modification algorithm into any commercial or military constant bit rate (CBR) or VBR codec without any modification in the vocoder structure. The results of the proposed method are statistically evaluated by using diagnostics rhyme tests (DRT) and mean opinion score (MOS) tests. The DRT results obtained from the simulation of the proposed system revealed, under 90% confidence interval, that the perceptual success of the adaptively compressed and G.729 coded speech is 98.92±0.03 percent. The MOS test results, on the other hand, proved that the system provides better perceptual quality than the standard time scale modification, indicating that the proposed system indeed provides graceful degradation in voice quality even in additive increase multiplicative decrease modeled channels, provided that the dynamic network conditions grant bandwidth.