SIFT-Bag kernel for video event analysis

Authors:
Xi Zhou;Xiaodan Zhuang;Shuicheng Yan;Shih-Fu Chang;Mark Hasegawa-Johnson;Thomas S. Huang
Affiliations:
UIUC, Urbana, IL, USA;UIUC, Urbana, IL, USA;NUS, Singapore, Singapore;Columbia University, NY, NY, USA;UIUC, Urbana, IL, USA;UIUC, Urbana, IL, USA
Venue:
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Year:
2008

Citing 23
Cited 19

A Bayesian Computer Vision System for Modeling Human Interactions

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Earth Mover's Distance as a Metric for Image Retrieval

International Journal of Computer Vision
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Coupled hidden Markov models for complex action recognition

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Object Labelling from Human Action Recognition

PERCOM '03 Proceedings of the First IEEE International Conference on Pervasive Computing and Communications
Object Recognition from Local Scale-Invariant Features

ICCV '99 Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2
Space-time Interest Points

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Action at a Distance

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Video Google: A Text Retrieval Approach to Object Matching in Videos

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Semi-Supervised Adapted HMMs for Unusual Event Detection

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Efficient Visual Event Detection Using Volumetric Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Detecting Irregularities in Images and in Video

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Large-Scale Concept Ontology for Multimedia

IEEE MultiMedia
Evaluation campaigns and TRECVid

MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

International Journal of Computer Vision
Behavior recognition via sparse spatio-temporal features

ICCCN '05 Proceedings of the 14th International Conference on Computer Communications and Networks
Trademark matching and retrieval in sports video databases

Proceedings of the international workshop on Workshop on multimedia information retrieval
Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Thousand Words in a Scene

IEEE Transactions on Pattern Analysis and Machine Intelligence
An efficient and effective region-based image retrieval framework

IEEE Transactions on Image Processing

Efficient object localization with gaussianized vector representation

IMCE '09 Proceedings of the 1st international workshop on Interactive multimedia for consumer electronics
Descriptive visual words and visual phrases for image applications

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Using cross-media correlation for scene detection in travel videos

Proceedings of the ACM International Conference on Image and Video Retrieval
Consumer photo management and browsing facilitated by near-duplicate detection with feature filtering

Journal of Visual Communication and Image Representation
Building contextual visual vocabulary for large-scale image applications

Proceedings of the international conference on Multimedia
Video based mobile location search with large set of SIFT points in cloud

Proceedings of the 2010 ACM multimedia workshop on Mobile cloud media computing
Event detection and recognition for semantic annotation of video

Multimedia Tools and Applications
Personalization in multimedia retrieval: A survey

Multimedia Tools and Applications
Top-down cues for event recognition

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
A fast MAP adaptation technique for gmm-supervector-based video semantic indexing systems

MM '11 Proceedings of the 19th ACM international conference on Multimedia
Exploring probabilistic localized video representation for human action recognition

Multimedia Tools and Applications
Toward a higher-level visual representation for content-based image retrieval

Multimedia Tools and Applications
Metric learning for large scale image classification: generalizing to new classes at near-zero cost

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II
Reordering video shots for event classification using bag-of-words models and string kernels

Proceedings of the 27th Conference on Image and Vision Computing New Zealand
A reward-and-punishment-based approach for concept detection using adaptive ontology rules

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Exact and easy guidance with visual navigation situation for mobile user

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Learning latent spatio-temporal compositional model for human action recognition

Proceedings of the 21st ACM international conference on Multimedia
Compact bag-of-words visual representation for effective linear classification

Proceedings of the 21st ACM international conference on Multimedia
Social-oriented visual image search

Computer Vision and Image Understanding

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we present a SIFT-Bag based generative-to-discriminative framework for addressing the problem of video event recognition in unconstrained news videos. In the generative stage, each video clip is encoded as a bag of SIFT feature vectors, the distribution of which is described by a Gaussian Mixture Models (GMM). In the discriminative stage, the SIFT-Bag Kernel is designed for characterizing the property of Kullback-Leibler divergence between the specialized GMMs of any two video clips, and then this kernel is utilized for supervised learning in two ways. On one hand, this kernel is further refined in discriminating power for centroid-based video event classification by using the Within-Class Covariance Normalization approach, which depresses the kernel components with high-variability for video clips of the same event. On the other hand, the SIFT-Bag Kernel is used in a Support Vector Machine for margin-based video event classification. Finally, the outputs from these two classifiers are fused together for final decision. The experiments on the TRECVID 2005 corpus demonstrate that the mean average precision is boosted from the best reported 38.2% in [36] to 60.4% based on our new framework.