Video tomography: an efficient method for camerawork extraction and motion analysis
MULTIMEDIA '94 Proceedings of the second ACM international conference on Multimedia
Learning Patterns of Activity Using Real-Time Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
News video classification using SVM-based multimodal classifiers and combination strategies
Proceedings of the tenth ACM international conference on Multimedia
A Graphical Model for Audiovisual Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence
Distinctive Image Features from Scale-Invariant Keypoints
International Journal of Computer Vision
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Scalable Recognition with a Vocabulary Tree
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Towards optimal bag-of-features for object categorization and semantic video retrieval
Proceedings of the 6th ACM international conference on Image and video retrieval
Audio-visual speech recognition using lip information extracted from side-face images
EURASIP Journal on Audio, Speech, and Music Processing
Evaluating bag-of-visual-words representations in scene classification
Proceedings of the international workshop on Workshop on multimedia information retrieval
Large-scale multimodal semantic concept detection for consumer video
Proceedings of the international workshop on Workshop on multimedia information retrieval
Audiovisual celebrity recognition in unconstrained web videos
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
The Pascal Visual Object Classes (VOC) Challenge
International Journal of Computer Vision
Audio-visual atoms for generic video concept classification
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Blind separation of instantaneous mixtures of nonstationary sources
IEEE Transactions on Signal Processing
Audio-Visual Event Recognition in Surveillance Video Sequences
IEEE Transactions on Multimedia
Joint audio-visual bi-modal codewords for video event detection
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Bimodal log-linear regression for fusion of audio and visual features
Proceedings of the 21st ACM international conference on Multimedia
Multimedia event detection with multimodal feature fusion and temporal concept localization
Machine Vision and Applications
Discovering joint audio---visual codewords for video event detection
Machine Vision and Applications
Hi-index | 0.00 |
We investigate general concept classification in unconstrained videos by joint audio-visual analysis. A novel representation, the Audio-Visual Grouplet (AVG), is extracted by studying the statistical temporal audio-visual interactions. An AVG is defined as a set of audio and visual codewords that are grouped together according to their strong temporal correlations in videos. The AVGs carry unique audio-visual cues to represent the video content, based on which an audio-visual dictionary can be constructed for concept classification. By using the entire AVGs as building elements, the audio-visual dictionary is much more robust than traditional vocabularies that use discrete audio or visual codewords. Specifically, we conduct coarse-level foreground/background separation in both audio and visual channels, and discover four types of AVGs by exploring mixed-and-matched temporal audio-visual correlations among the following factors: visual foreground, visual background, audio foreground, and audio background. All of these types of AVGs provide discriminative audio-visual patterns for classifying various semantic concepts. We extensively evaluate our method over the large-scale Columbia Consumer Video set. Experiments demonstrate that the AVG-based dictionaries can achieve consistent and significant performance improvements compared with other state-of-the-art approaches.