Audio-Video Sensor Fusion with Probabilistic Graphical Models
ECCV '02 Proceedings of the 7th European Conference on Computer Vision-Part I
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
ReCoM: reinforcement clustering of multi-type interrelated data objects
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
The Journal of Machine Learning Research
Video summarization based on user log enhanced link analysis
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Multi-model similarity propagation and its application for web image retrieval
Proceedings of the 12th annual ACM international conference on Multimedia
Content-based image retrieval: approaches and trends of the new age
Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval
Cross-modal correlation learning for clustering on image-audio dataset
Proceedings of the 15th international conference on Multimedia
Keyframe-Based Video Summary Using Visual Attention Clues
IEEE MultiMedia
Multiple Bernoulli relevance models for image and video annotation
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Enhanced Eigen-Audioframes for Audiovisual Scene Change Detection
IEEE Transactions on Multimedia
CBSA: content-based soft annotation for multimodal image retrieval using Bayes point machines
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
In this paper, we propose a novel cross-media correlation detection method for movie keyframe retrieval. We first compute the temporal saliency on both the video and audio streams in a movie separately, then locate the resonance regions that the saliency changes in these two modalities show strong correlations. Next, starting from resonance regions, we propagate the similarity of visual and auditory characteristics through neighboring movie regions based on a temporal movie context model, segmenting the movie into a sequence of coherent parts from which keyframes are extracted. The experimental results on actual movie clips show that, compared to the single-modality algorithms, our method gives improved retrieval performance in completeness and precision due to the efficient exploitation of the context and correlations between complementary multi-modalities.