Attribute learning for understanding unstructured social activity

Authors:
Yanwei Fu;Timothy M. Hospedales;Tao Xiang;Shaogang Gong
Affiliations:
School of EECS, Queen Mary University of London, UK;School of EECS, Queen Mary University of London, UK;School of EECS, Queen Mary University of London, UK;School of EECS, Queen Mary University of London, UK
Venue:
ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part IV
Year:
2012

Citing 18
Cited 2

Latent dirichlet allocation

The Journal of Machine Learning Research
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Kodak's consumer video benchmark data set: concept definition and annotation

Proceedings of the international workshop on Workshop on multimedia information retrieval
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Concept-Based Video Retrieval

Foundations and Trends in Information Retrieval
Human Action Recognition by Semilatent Topic Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Inferring semantic concepts from community-contributed images and noisy tags

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
Consumer video understanding: a benchmark database and an evaluation of human and machine performance

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning Tags from Unsegmented Videos of Multiple Human Actions

ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Recognizing human actions by attributes

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Interactively building a discriminative vocabulary of nameable attributes

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Learning to share visual appearance for multiclass object detection

CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Adding Semantics to Detectors for Video Retrieval

IEEE Transactions on Multimedia
Video Annotation Based on Kernel Linear Neighborhood Propagation

IEEE Transactions on Multimedia
Relative attributes

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
A joint learning framework for attribute models and object descriptions

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Towards person identification and re-identification with attributes

ECCV'12 Proceedings of the 12th international conference on Computer Vision - Volume Part I
Towards zero-shot learning for human activity recognition using semantic attribute sequence model

Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid development of social video sharing platforms has created a huge demand for automatic video classification and annotation techniques, in particular for videos containing social activities of a group of people (e.g. YouTube video of a wedding reception). Recently, attribute learning has emerged as a promising paradigm for transferring learning to sparsely labelled classes in object or single-object short action classification. In contrast to existing work, this paper for the first time, tackles the problem of attribute learning for understanding group social activities with sparse labels. This problem is more challenging because of the complex multi-object nature of social activities, and the unstructured nature of the activity context. To solve this problem, we (1) contribute an unstructured social activity attribute (USAA) dataset with both visual and audio attributes, (2) introduce the concept of semi-latent attribute space and (3) propose a novel model for learning the latent attributes which alleviate the dependence of existing models on exact and exhaustive manual specification of the attribute-space. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multi-media sparse data learning tasks including: multi-task learning, N-shot transfer learning, learning with label noise and importantly zero-shot learning.