Automatic video editing system using stereo-based head tracking for multiparty conversation

Authors:
Yoshinao Takemae;Kazuhiro Otsuka;Junji Yamato
Affiliations:
NTT Corporation, Kanagawa, Japan;NTT Corporation, Kanagawa, Japan;NTT Corporation, Kanagawa, Japan
Venue:
CHI '05 Extended Abstracts on Human Factors in Computing Systems
Year:
2005

Citing 6
Cited 2

Learning from TV programs: application of TV presentation to a videoconferencing system

Proceedings of the 8th annual ACM symposium on User interface and software technology
Head orientation and gaze direction in meetings

CHI '02 Extended Abstracts on Human Factors in Computing Systems
Distributed meetings: a meeting capture and broadcasting system

Proceedings of the tenth ACM international conference on Multimedia
Just blink your eyes: a head-free gaze tracking system

CHI '03 Extended Abstracts on Human Factors in Computing Systems
Impact of video editing based on participants' gaze in multiparty conversation

CHI '04 Extended Abstracts on Human Factors in Computing Systems
Adaptive view-based appearance models

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

Improving meeting capture by applying television production principles with audio and motion detection

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Enhancing communication and dramatic impact of online live performance with cooperative audience control

Proceedings of the 2012 ACM Conference on Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an automatic video editing system based on head tracking for multiparty conversations. Systems that record meetings and those that support teleconferences are attracting considerable interest. Conventional systems use a fixed-viewpoint camera and simple camera selection based on participants' utterances. However, conventional systems fail to adequately convey who is talking to whom to the viewer. We focus on the participants' head orientation since this information is useful in detecting the speaker and who the speaker is talking to. In order to automatically estimate each participant's head orientation, our system combines several modules for stereo-based head tracking. The system selects the shot of the participant that most participants are looking at, based on majority decision. Experiments confirm the effectiveness of our system in several 3-participant conversations. The results show that our system can more successfully convey who is talking to whom which is an extremely crucial piece of information that allows the viewer to better under-stand conversation content.