Semi-supervised learning for semantic video retrieval

Authors:
Ralph Ewerth;Bernd Freisleben
Affiliations:
University of Marburg, Marburg, Germany;University of Marburg, Marburg, Germany
Venue:
Proceedings of the 6th ACM international conference on Image and video retrieval
Year:
2007

Citing 15
Cited 5

A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A robust audio classification and segmentation method

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Media Computing: Computational Media Aesthetics

Media Computing: Computational Media Aesthetics
Robust Real-Time Face Detection

International Journal of Computer Vision
On the detection of semantic concepts at TRECVID

Proceedings of the 12th annual ACM international conference on Multimedia
Semi-Supervised Cross Feature Learning for Semantic Concept Detection in Videos

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Tracking concept drifting with an online-optimized incremental learning framework

Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Robust Scene Categorization by Learning Image Statistics in Context

CVPRW '06 Proceedings of the 2006 Conference on Computer Vision and Pattern Recognition Workshop
Multimedia semantic indexing using model vectors

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 1
The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing

IEEE Transactions on Pattern Analysis and Machine Intelligence
The challenge problem for automated detection of 101 semantic concepts in multimedia

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Self-Supervised Learning of Face Appearances in TV Casts and Movies

ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia
Video motion representation for improved content access

IEEE Transactions on Consumer Electronics
Color and texture descriptors

IEEE Transactions on Circuits and Systems for Video Technology

Adapting appearance models of semantic concepts to particular videos via transductive learning

Proceedings of the international workshop on Workshop on multimedia information retrieval
Semi-supervised learning of object categories from paired local features

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Unified video annotation via multigraph learning

IEEE Transactions on Circuits and Systems for Video Technology
Motion data-driven model for semantic events classification using an optimized support vector machine

Proceedings of the ACM International Conference on Image and Video Retrieval
Request/response aspects for web services

CAiSE'11 Proceedings of the 23rd international conference on Advanced information systems engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The automatic understanding of audiovisual content for multimedia retrieval is a difficult task, since the meaning respectively the appearance of a certain event or concept is strongly determined by contextual information. For example, the appearance of a high-level concept, such as e.g. maps or news anchors, is determined by the used editing layout which usually is typical for a certain broadcasting station. In this paper, we show that it is possible to adaptively learn the appearance of certain objects or events for a particular test video utilizing unlabeled data in order to improve a subsequent retrieval process. First, an initial model is obtained via supervised learning using a set of appropriate training videos. Then, this initial model is used to rank shots for each test video v separately. This ranking is used to label the most relevant and most irrelevant shots in a video v for subsequent use as training data in a semi-supervised learning process. Based on these automatically labeled training data, relevant features are selected for the concept under consideration for video v. Then, two additional classifiers are trained on the automatically labeled data of this video. Adaboost and Support Vector Machines (SVM) are incorporated for feature selection and ensemble classification. Finally, the newly trained classifiers and the initial model form an ensemble. Experimental results on TRECVID 2005 video data demonstrate the feasibility of the proposed learning scheme for certain high-level concepts.