Video Event Classification Using Bag of Words and String Kernels

Authors:
Lamberto Ballan;Marco Bertini;Alberto Bimbo;Giuseppe Serra
Affiliations:
Media Integration and Communication Center, University of Florence, Italy;Media Integration and Communication Center, University of Florence, Italy;Media Integration and Communication Center, University of Florence, Italy;Media Integration and Communication Center, University of Florence, Italy
Venue:
ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Year:
2009

Citing 13
Cited 2

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Text classification using string kernels

The Journal of Machine Learning Research
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Distinctive Image Features from Scale-Invariant Keypoints

International Journal of Computer Vision
A Performance Evaluation of Local Descriptors

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Comparison of Affine Region Detectors

International Journal of Computer Vision
Edit distance-based kernel functions for structural pattern classification

Pattern Recognition
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

International Journal of Computer Vision
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

IEEE Transactions on Pattern Analysis and Machine Intelligence
Video event detection using motion relativity and visual relatedness

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Event detection in field sports video using audio-visual features and a support vector Machine

IEEE Transactions on Circuits and Systems for Video Technology

Localization and recognition of the scoreboard in sports video based on SIFT point matching

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part II
Multiple feature fusion based on co-training approach and time regularization for place classification in wearable video

Advances in Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However this approach does not model the temporal information of the video stream. In this paper we present a method to introduce temporal information within the BoW approach. Events are modeled as a sequence composed of histograms of visual features, computed from each frame using the traditional BoW model. The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel that uses the Needlemann-Wunsch edit distance. Experimental results, performed on two datasets, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.