Segmental multi-way local pooling for video recognition

Authors:
Ilseo Kim;Sangmin Oh;Arash Vahdat;Kevin Cannons;A.G. Amitha Perera;Greg Mori
Affiliations:
Kitware Inc., Clifton Park, NY, USA;Kitware Inc., Clifton Park, NY, USA;Simon Fraser University, Burnaby, BC, Canada;Simon Fraser University, Burnaby, BC, Canada;Kitware Inc., Clifton Park, NY, USA;Simon Fraser University, Burnaby, BC, Canada
Venue:
Proceedings of the 21st ACM international conference on Multimedia
Year:
2013

Citing 10
Cited 0

Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope

International Journal of Computer Vision
Text classification with kernels on the multinomial manifold

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
More generality in efficient multiple kernel learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Modeling temporal structure of decomposable motion segments for activity classification

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Consumer video understanding: a benchmark database and an evaluation of human and machine performance

Proceedings of the 1st ACM International Conference on Multimedia Retrieval
lp-Norm Multiple Kernel Learning

The Journal of Machine Learning Research
Learning latent temporal structure for complex event detection

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Multimodal feature fusion for robust event detection in web videos

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Ask the locals: Multi-way local pooling for image recognition

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Scene aligned pooling for complex video recognition

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we address the problem of complex event detection on unconstrained videos. We introduce a novel multi-way feature pooling approach which leverages segment-level information. The approach is simple and widely applicable to diverse audio-visual features. Our approach uses a set of clusters discovered via unsupervised clustering of segment-level features. Depending on feature characteristics, not only scene-based clusters but also motion/audio-based clusters can be incorporated. Then, every video is represented with multiple descriptors, where each descriptor is designed to relate to one of the pre-built clusters. For classification, intersection kernel SVMs are used where the kernel is obtained by combining multiple kernels computed from corresponding per-cluster descriptor pairs. Evaluation on TRECVID'11 MED dataset shows a significant improvement by the proposed approach beyond the state-of-the-art.