CONDENSATION—Conditional Density Propagation forVisual Tracking
International Journal of Computer Vision
A framework for multiple-instance learning
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Blobworld: Image Segmentation Using Expectation-Maximization and Its Application to Image Querying
IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiple-Instance Learning for Natural Scene Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel Methods
FGR '02 Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition
Robust Real-Time Face Detection
International Journal of Computer Vision
Automatic Face Recognition for Film Character Retrieval in Feature-Length Films
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 1 - Volume 01
Word sense disambiguation with pictures
HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Face-and-clothing based people clustering in video content
Proceedings of the international conference on Multimedia information retrieval
Cross-modal alignment for wildlife recognition
Proceedings of the 2nd ACM international workshop on Multimedia analysis for ecological data
Hi-index | 0.00 |
We propose a novel way of combining weakly associated video/audio and text steams in an unsupervised manner which is faster than conventional speech recognition. The technique aligns audio/video and text streams which will enable video search using the associated text. Multimedia of this form includes news broadcast with summaries, parliament proceedings and court trials with transcripts, sports telecast with text commentary, etc. We also show how we can annotate the video with the names of the person appearing in the video which will allow name based indexing/search. We test the technique on a 80 minute video segment downloaded from the website of the International Court of the Former Yugoslavia, with the corresponding transcripts. The proposed technique achieves 88.49% accuracy on sentence level alignments and 95.5% accuracy on the task of assigning names to faces.