A low-latency lip-synchronized videoconferencing system

Authors:
Milton Chen
Affiliations:
Stanford University, Stanford, CA
Venue:
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Year:
2003

Citing 4
Cited 4

Perceiving asynchronous bimodal speech in consonant-vowel and vowel syllables

Speech Communication - Special issue: Fujisaki's Festschrift
Design of a virtual auditorium

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Computational Requirements and Synchronization Issues for Virtual Acoustic Displays

Presence: Teleoperators and Virtual Environments
Human perception of jitter and media synchronization

IEEE Journal on Selected Areas in Communications

BiReality: mutually-immersive telepresence

Proceedings of the 12th annual ACM international conference on Multimedia
Multimedia group and inter-stream synchronization techniques: A comparative study

Information Systems
The FishbowlTM: degrees of engagement in global teamwork

EG-ICE'06 Proceedings of the 13th international conference on Intelligent Computing in Engineering and Architecture
Enhanced adaptive RTCP-based Inter-Destination Multimedia Synchronization approach for distributed applications

Computer Networks: The International Journal of Computer and Telecommunications Networking

Quantified Score

Hi-index	0.01

Visualization

Abstract

Audio is presented ahead of video in some videoconferencing systems since audio requires less time to process. Audio could be delayed to synchronize with video to achieve lip synchronization; however, the overall audio latency might then become unacceptable. We built a videoconferencing system to achieve lip synchronization with minimal perceived audio latency. Instead of adding a fixed audio delay, our system time-stretches the audio at the beginning of each utterance until the audio is synchronized with the video. We conducted user studies and found that (1) audio could lead video by roughly 50 msec and still be perceived as synchronized; (2) audio could lead video by 300 msec and still be perceived as synchronized if the audio was time-stretched to synchronization within a short period; and (3) our algorithm appears to strike a favorable balance between minimizing audio latency and supporting lip synchronization.