In-video product annotation with web information mining

Authors:
Guangda Li;Meng Wang;Zheng Lu;Richang Hong;Tat-Seng Chua
Affiliations:
National University of Singapore, Singapore;Hefei University of Technology;National University of Singapore, Singapore;Hefei University of Technology;National University of Singapore, Singapore
Venue:
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Year:
2012

Citing 26
Cited 2

WordNet: a lexical database for English

Communications of the ACM
Inverted files versus signature files for text indexing

ACM Transactions on Database Systems (TODS)
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
Scalable Recognition with a Vocabulary Tree

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Large-Scale Concept Ontology for Multimedia

IEEE MultiMedia
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Semantic concept-based query expansion and re-ranking for multimedia retrieval

Proceedings of the 15th international conference on Multimedia
Pagerank for product image search

Proceedings of the 17th international conference on World Wide Web
Collaborative learning for image and video annotation

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Performance evaluation of local colour invariants

Computer Vision and Image Understanding
Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search

ECCV '08 Proceedings of the 10th European Conference on Computer Vision: Part I
AdOn: an intelligent overlay video advertising system

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
Beyond distance measurement: constructing neighborhood similarity for video annotation

IEEE Transactions on Multimedia - Special section on communities and media computing
Logo detection based on spatial-spectral saliency and partial spatial context

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Can social tagged images aid concept-based video search?

ICME'09 Proceedings of the 2009 IEEE international conference on Multimedia and Expo
Learning automatic concept detectors from online video

Computer Vision and Image Understanding
Evaluating Color Descriptors for Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multi-modal multi-correlation person-centric news retrieval

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Active learning in multimedia annotation and retrieval: A survey

ACM Transactions on Intelligent Systems and Technology (TIST)
Harvesting Image Databases from the Web

IEEE Transactions on Pattern Analysis and Machine Intelligence
Scalable logo recognition in real-world images

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Towards a Relevant and Diverse Search of Social Images

IEEE Transactions on Multimedia
Assistive tagging: A survey of multimedia tagging with human-computer joint exploration

ACM Computing Surveys (CSUR)

Tagging photos using users' vocabularies

Neurocomputing
Multi-view hypergraph learning by patch alignment framework

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Product annotation in videos is of great importance for video browsing, search, and advertisement. However, most of the existing automatic video annotation research focuses on the annotation of high-level concepts, such as events, scenes, and object categories. This article presents a novel solution to the annotation of specific products in videos by mining information from the Web. It collects a set of high-quality training data for each product by simultaneously leveraging Amazon and Google image search engine. A visual signature for each product is then built based on the bag-of-visual-words representation of the training images. A correlative sparsification approach is employed to remove noisy bins in the visual signatures. These signatures are used to annotate video frames. We conduct experiments on more than 1,000 videos and the results demonstrate the feasibility and effectiveness of our approach.