Audio-visual grouplet: temporal audio-visual interactions for general video concept classification
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Hi-index | 0.00 |
The number of video clips available online is growing at a tremendous pace. Conventionally, user-supplied metadata text, such as the title of the video and a set of keywords, has been the only source of indexing information for user-uploaded videos. Automated extraction of video content for unconstrained and large scale video databases is a challenging and yet unsolved problem. In this paper, we present an audiovisual celebrity recognition system towards automatic tagging of unconstrained web videos. Prior work on audiovisual person recognition relied on the fact that the person in the video is speaking and the features extracted from audio and visual domain are associated with each other throughout the video. However, this assumption is not valid on unconstrained web videos. Proposed method finds the audiovisual mapping and hence improve upon the association assumption. Considering the scale of the application, all pieces of the system are trained automatically without any human supervision. We present the results on 26,000 videos and show the effectiveness of the method per-celebrity basis.