Video Event Recognition Using Kernel Methods with Multilevel Temporal Alignment

Authors:
Dong Xu;Shih-Fu Chang
Affiliations:
Nanyang Technological University, Singapore;Columbia University, New York
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2008

Citing 0
Cited 32

SIFT-Bag kernel for video event analysis

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Spatio-temporal pyramid matching for sports videos

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Object detection by color histogram-based fuzzy classifier with support vector learning

Neurocomputing
Video Event Classification Using Bag of Words and String Kernels

ICIAP '09 Proceedings of the 15th International Conference on Image Analysis and Processing
Speeding up spatio-temporal sliding-window search for efficient event detection in crowded videos

EiMM '09 Proceedings of the 1st ACM international workshop on Events in multimedia
Descriptive visual words and visual phrases for image applications

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Scalable detection of partial near-duplicate videos by visual-temporal consistency

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Event recognition from photo collections via PageRank

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Recognition of Semantic Basketball Events Based on Optical Flow Patterns

ISVC '09 Proceedings of the 5th International Symposium on Advances in Visual Computing: Part II
Using Concept Recognition to Annotate a Video Collection

PReMI '09 Proceedings of the 3rd International Conference on Pattern Recognition and Machine Intelligence
Tuning SVM parameters by using a hybrid CLPSO-BFGS algorithm

Neurocomputing
Text-based video content classification for online video-sharing sites

Journal of the American Society for Information Science and Technology
Building contextual visual vocabulary for large-scale image applications

Proceedings of the international conference on Multimedia
Discovering phrase-level lexicon for image annotation

PCM'10 Proceedings of the 11th Pacific Rim conference on Advances in multimedia information processing: Part I
Event detection and recognition for semantic annotation of video

Multimedia Tools and Applications
Personalization in multimedia retrieval: A survey

Multimedia Tools and Applications
Building descriptive and discriminative visual codebook for large-scale image applications

Multimedia Tools and Applications
Modeling spatial and semantic cues for large-scale near-duplicated image retrieval

Computer Vision and Image Understanding
A feature sequence kernel for video concept classification

MMM'11 Proceedings of the 17th international conference on Advances in multimedia modeling - Volume Part I
Top-down cues for event recognition

ACCV'10 Proceedings of the 10th Asian conference on Computer vision - Volume Part III
Physical activity recognition based on motion in images acquired by a wearable camera

Neurocomputing
News story clustering from both what and how aspects: using bag of word model and affinity propagation

AIEMPro '11 Proceedings of the 2011 ACM international workshop on Automated media analysis and production for novel TV services
Similarity learning for object recognition based on derived kernel

Neurocomputing
Sequence kernels for clustering and visualizing near duplicate video segments

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
View-Invariant action recognition using latent kernelized structural SVM

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part V
Reordering video shots for event classification using bag-of-words models and string kernels

Proceedings of the 27th Conference on Image and Vision Computing New Zealand
A polynomial model of surgical gestures for real-time retrieval of surgery videos

MCBR-CDS'12 Proceedings of the Third MICCAI international conference on Medical Content-Based Retrieval for Clinical Decision Support
A reward-and-punishment-based approach for concept detection using adaptive ontology rules

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Searching informative concept banks for video event detection

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Exploring STIP-based models for recognizing human interactions in TV videos

Pattern Recognition Letters
Evaluating multimedia features and fusion for example-based event detection

Machine Vision and Applications
Multimedia Event Detection Using Segment-Based Approach for Motion Feature

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.14

Visualization

Abstract

In this work, we systematically study the problem of event recognition in unconstrained news video sequences. We adopt the discriminative kernel-based method for which video clip similarity plays an important role. First, we represent a video clip as a bag of orderless descriptors extracted from all of the constituent frames and apply the Earth Mover's Distance (EMD) to integrate similarities among frames from two clips. Observing that a video clip is usually comprised of multiple subclips corresponding to event evolution over time, we further build a multi-level temporal pyramid. At each pyramid level, we integrate the information from different subclips with Integer-valueconstrained EMD to explicitly align the subclips. By fusing the information from the different pyramid levels, we develop Temporally Aligned Pyramid Matching (TAPM) for measuring video similarity. We conduct comprehensive experiments on the Trecvid 2005 corpus, which contains more than 6,800 clips. Our experiments demonstrate that 1) the TAPM multi-level method clearly outperforms single-level EMD, and 2) single-level EMD outperforms keyframe and multi-frame based detection methods by a large margin. In addition, we conduct in-depth investigation of various aspects of the proposed techniques, such as weight selection in single-level EMD, sensitivity to temporal clustering, the effect of temporal alignment, and possible approaches for speedup.