Audiovisual celebrity recognition in unconstrained web videos

Authors:
Mehmet Emre Sargin;Hrishikesh Aradhye;Pedro J. Moreno; Ming Zhao
Affiliations:
Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA;Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA;Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA;Google Inc., 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
Venue:
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Year:
2009

Citing 0
Cited 1

Audio-visual grouplet: temporal audio-visual interactions for general video concept classification

MM '11 Proceedings of the 19th ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

The number of video clips available online is growing at a tremendous pace. Conventionally, user-supplied metadata text, such as the title of the video and a set of keywords, has been the only source of indexing information for user-uploaded videos. Automated extraction of video content for unconstrained and large scale video databases is a challenging and yet unsolved problem. In this paper, we present an audiovisual celebrity recognition system towards automatic tagging of unconstrained web videos. Prior work on audiovisual person recognition relied on the fact that the person in the video is speaking and the features extracted from audio and visual domain are associated with each other throughout the video. However, this assumption is not valid on unconstrained web videos. Proposed method finds the audiovisual mapping and hence improve upon the association assumption. Considering the scale of the application, all pieces of the system are trained automatically without any human supervision. We present the results on 26,000 videos and show the effectiveness of the method per-celebrity basis.