A low-latency lip-synchronized videoconferencing system

  • Authors:
  • Milton Chen

  • Affiliations:
  • Stanford University, Stanford, CA

  • Venue:
  • Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.01

Visualization

Abstract

Audio is presented ahead of video in some videoconferencing systems since audio requires less time to process. Audio could be delayed to synchronize with video to achieve lip synchronization; however, the overall audio latency might then become unacceptable. We built a videoconferencing system to achieve lip synchronization with minimal perceived audio latency. Instead of adding a fixed audio delay, our system time-stretches the audio at the beginning of each utterance until the audio is synchronized with the video. We conducted user studies and found that (1) audio could lead video by roughly 50 msec and still be perceived as synchronized; (2) audio could lead video by 300 msec and still be perceived as synchronized if the audio was time-stretched to synchronization within a short period; and (3) our algorithm appears to strike a favorable balance between minimizing audio latency and supporting lip synchronization.