WordNet: a lexical database for English
Communications of the ACM
Multiple-Instance Learning for Natural Scene Classification
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Incorporating non-local information into information extraction systems by Gibbs sampling
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Fast unsupervised alignment of video and text for indexing/names and faces
Workshop on multimedia information retrieval on The many faces of multimedia semantics
Evaluating bag-of-visual-words representations in scene classification
Proceedings of the international workshop on Workshop on multimedia information retrieval
Coreference resolution with reconcile
ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Elkan's k-means algorithm for graphs
MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Cross-Media Alignment of Names and Faces
IEEE Transactions on Multimedia
ICCV '11 Proceedings of the 2011 International Conference on Computer Vision
Summary abstract for the 2nd ACM international workshop on multimedia analysis for ecological data
Proceedings of the 21st ACM international conference on Multimedia
Hi-index | 0.00 |
We propose an unsupervised framework for recognizing animals in videos using subtitles. In this framework, the alignment between animals and their names is performed using an Expectation Maximization algorithm which is adapted to two very different circumstances- 1) when the bounding boxes are available and 2) when the frame as a whole is used instead of bounding boxes. With the goal of maximizing precision, recall and F-measure, the experiments compare a multitude of natural language processing approaches and visual features when associating animal names in the subtitles with visual patterns. The proposed unsupervised methods obtain 83.1% F1 using bounding boxes and 65.7% F1 without bounding boxes in a fully automated pipeline.