Learning Tags from Unsegmented Videos of Multiple Human Actions

Authors:
Timothy M. Hospedales;Shaogang Gong;Tao Xiang
Affiliations:
-;-;-
Venue:
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Year:
2011

Citing 0
Cited 2

Attribute learning for understanding unstructured social activity

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Recognizing human-human interaction activities using visual and textual information

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Providing methods to support semantic interaction with growing volumes of video data is an increasingly important challenge for data mining. To this end, there has been some success in recognition of simple objects and actions in video, however most of this work requires strongly supervised training data. The supervision cost of these approaches therefore renders them economically non-scalable for real world applications. In this paper we address the problem of learning to annotate and retrieve semantic tags of human actions in realistic video data with sparsely provided tags of semantically salient activities. This is challenging because of (1) the multi-label nature of the learning problem and (2) realistic videos are often dominated by (semantically uninteresting) background activity un-supported by any tags of interest, leading to a strong irrelevant data problem. To address these challenges, we introduce a new topic model based approach to video tag annotation. Our model simultaneously learns a low dimensional representation of the video data, which dimensions are semantically relevant (supported by tags), and how to annotate videos with tags. Experimental evaluation on three different video action/activity datasets demonstrate the challenge of this problem, and value of our contribution.