Cross-modal alignment for wildlife recognition

Authors:
Thibaut Dusart;Aparna Nurani Venkitasubramanian;Marie-Francine Moens
Affiliations:
KU Leuven, Leuven, Belgium;KU Leuven, Leuven, Belgium;KU Leuven, Leuven, Belgium
Venue:
Proceedings of the 2nd ACM international workshop on Multimedia analysis for ecological data
Year:
2013

Citing 11
Cited 1

WordNet: a lexical database for English

Communications of the ACM
Multiple-Instance Learning for Natural Scene Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Animals on the Web

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Fast unsupervised alignment of video and text for indexing/names and faces

Workshop on multimedia information retrieval on The many faces of multimedia semantics
Evaluating bag-of-visual-words representations in scene classification

Proceedings of the international workshop on Workshop on multimedia information retrieval
Coreference resolution with reconcile

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Elkan's k-means algorithm for graphs

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Cross-Media Alignment of Names and Faces

IEEE Transactions on Multimedia
The truth about cats and dogs

ICCV '11 Proceedings of the 2011 International Conference on Computer Vision

Summary abstract for the 2nd ACM international workshop on multimedia analysis for ecological data

Proceedings of the 21st ACM international conference on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose an unsupervised framework for recognizing animals in videos using subtitles. In this framework, the alignment between animals and their names is performed using an Expectation Maximization algorithm which is adapted to two very different circumstances- 1) when the bounding boxes are available and 2) when the frame as a whole is used instead of bounding boxes. With the goal of maximizing precision, recall and F-measure, the experiments compare a multitude of natural language processing approaches and visual features when associating animal names in the subtitles with visual patterns. The proposed unsupervised methods obtain 83.1% F1 using bounding boxes and 65.7% F1 without bounding boxes in a fully automated pipeline.