Numerical Recipes in C++: the art of scientific computing
Numerical Recipes in C++: the art of scientific computing
Semantic Video Retrieval Using Audio Analysis
CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Video Google: A Text Retrieval Approach to Object Matching in Videos
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Recognizing Human Actions: A Local SVM Approach
ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Automatic Analysis of Multimodal Group Actions in Meetings
IEEE Transactions on Pattern Analysis and Machine Intelligence
Histograms of Oriented Gradients for Human Detection
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
International Journal of Computer Vision
Evaluation campaigns and TRECVid
MIR '06 Proceedings of the 8th ACM international workshop on Multimedia information retrieval
Free viewpoint action recognition using motion history volumes
Computer Vision and Image Understanding - Special issue on modeling people: Vision-based understanding of a person's shape, appearance, movement, and behaviour
IEEE Transactions on Pattern Analysis and Machine Intelligence
More generality in efficient multiple kernel learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
Understanding interactions and guiding visual surveillance by tracking attention
ACCV'10 Proceedings of the 2010 international conference on Computer vision - Volume Part I
Efficient Additive Kernels via Explicit Feature Maps
IEEE Transactions on Pattern Analysis and Machine Intelligence
Human detection using oriented histograms of flow and appearance
ECCV'06 Proceedings of the 9th European conference on Computer Vision - Volume Part II
Machine Recognition of Human Activities: A Survey
IEEE Transactions on Circuits and Systems for Video Technology
Joint audio-visual bi-modal codewords for video event detection
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval
Robust late fusion with rank minimization
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Human action recognition by learning bases of action attributes and parts
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Structured Learning of Human Interactions in TV Shows
IEEE Transactions on Pattern Analysis and Machine Intelligence
Special issue on Multimedia Event Detection
Machine Vision and Applications
Hi-index | 0.00 |
Human Interaction Recognition (HIR) in uncontrolled TV video material is a very challenging problem because of the huge intra-class variability of the classes (due to large differences in the way actions are performed, lighting conditions and camera viewpoints, amongst others) as well as the existing small inter-class variability (e.g., the visual difference between hug and kiss is very subtle). Most of previous works have been focused only on visual information (i.e., image signal), thus missing an important source of information present in human interactions: the audio. So far, such approaches have not shown to be discriminative enough. This work proposes the use of Audio-Visual Bag of Words (AVBOW) as a more powerful mechanism to approach the HIR problem than the traditional Visual Bag of Words (VBOW). We show in this paper that the combined use of video and audio information yields to better classification results than video alone. Our approach has been validated in the challenging TVHID dataset showing that the proposed AVBOW provides statistically significant improvements over the VBOW employed in the related literature.