ShotTagger: tag location for internet videos

Authors:
Guangda Li;Meng Wang;Yan-Tao Zheng;Haojie Li;Zheng-Jun Zha;Tat-Seng Chua
Affiliations:
NUS Graduate School for Integrative Sciences and Engineering and National University of Singapore;National University of Singapore;National University of Singapore;Dalian University of Technology;National University of Singapore;National University of Singapore
Venue:
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Year:
2011

Citing 21
Cited 3

A framework for multiple-instance learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Localized content based image retrieval

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Supervised versus multiple instance learning: an empirical comparison

ICML '05 Proceedings of the 22nd international conference on Machine learning
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Multi-modality web video categorization

Proceedings of the international workshop on Workshop on multimedia information retrieval
Flickr tag recommendation based on collective knowledge

Proceedings of the 17th international conference on World Wide Web
Identifying relevant frames in weakly labeled videos for training concept detectors

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Graph-based semi-supervised learning with multiple labels

Journal of Visual Communication and Image Representation
SmartPlayer: user-centric video fast-forwarding

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Automatic video tagging using content redundancy

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Short-term audio-visual atoms for generic video concept classification

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Visual query suggestion

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Towards google challenge: combining contextual and social information for web video categorization

MM '09 Proceedings of the 17th ACM international conference on Multimedia
TubeFiler: an automatic web video categorizer

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
Beyond distance measurement: constructing neighborhood similarity for video annotation

IEEE Transactions on Multimedia - Special section on communities and media computing
Learning automatic concept detectors from online video

Computer Vision and Image Understanding
Active learning in multimedia annotation and retrieval: A survey

ACM Transactions on Intelligent Systems and Technology (TIST)
Towards a Relevant and Diverse Search of Social Images

IEEE Transactions on Multimedia

Enriching and localizing semantic tags in internet videos

MM '11 Proceedings of the 19th ACM international conference on Multimedia
A social network for video annotation and discovery based on semantic profiling

Proceedings of the 21st international conference companion on World Wide Web
Assistive tagging: A survey of multimedia tagging with human-computer joint exploration

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Social video sharing websites allow users to annotate videos with descriptive keywords called tags, which greatly facilitate video search and browsing. However, many tags only describe part of the video content, without any temporal indication on when the tag actually appears. Currently, there is very little research on automatically assigning tags to shot-level segments of a video. In this paper, we leverage user's tags as a source to analyze the content within the video and develop a novel system named ShotTagger to assign tags at the shot level. There are two steps to accomplish the location of tags at shot level. The first is to estimate the distribution of tags within the video, which is based on a multiple instance learning framework. The second is to perform the semantic correlation of a tag with other tags in a video in an optimization framework and impose the temporal smoothness across adjacent video shots to refine the tagging results at shot level. We present different applications to demonstrate the usefulness of the tag location scheme in searching, and browsing of videos. A series of experiments conducted on a set of Youtube videos has demonstrated the feasibility and effectiveness of our approach.