A framework for multiple-instance learning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Learning Patterns of Activity Using Real-Time Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised Segmentation of Color-Texture Regions in Images and Video
IEEE Transactions on Pattern Analysis and Machine Intelligence
Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications
ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume II - Volume II
A Graphical Model for Audiovisual Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Video retrieval using spatio-temporal descriptors
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
International Journal of Computer Vision - Special Issue on Content-Based Image Retrieval
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Image Categorization by Learning and Reasoning with Regions
The Journal of Machine Learning Research
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Analysis of vector space model and spatiotemporal segmentation for video indexing and retrieval
Proceedings of the 6th ACM international conference on Image and video retrieval
Audio-visual speech recognition using lip information extracted from side-face images
EURASIP Journal on Audio, Speech, and Music Processing
Kodak's consumer video benchmark data set: concept definition and annotation
Proceedings of the international workshop on Workshop on multimedia information retrieval
Large-scale multimodal semantic concept detection for consumer video
Proceedings of the international workshop on Workshop on multimedia information retrieval
Object tracking using SIFT features and mean shift
Computer Vision and Image Understanding
Audio-Visual Event Recognition in Surveillance Video Sequences
IEEE Transactions on Multimedia
Robust online appearance models for visual tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Audio-visual grouplet: temporal audio-visual interactions for general video concept classification
MM '11 Proceedings of the 19th ACM international conference on Multimedia
Near-lossless semantic video summarization and its applications to video analysis
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Hi-index | 0.00 |
We investigate the challenging issue of joint audio-visual analysis of generic videos targeting at concept detection. We extract a novel local representation, Audio-Visual Atom (AVA), which is defined as a region track associated with regional visual features and audio onset features. We develop a hierarchical algorithm to extract visual atoms from generic videos, and locate energy onsets from the corresponding soundtrack by time-frequency analysis. Audio atoms are extracted around energy onsets. Visual and audio atoms form AVAs, based on which discriminative audio-visual codebooks are constructed for concept detection. Experiments over Kodak's consumer benchmark videos confirm the effectiveness of our approach.