Cross-modal alignment for wildlife recognition

  • Authors:
  • Thibaut Dusart;Aparna Nurani Venkitasubramanian;Marie-Francine Moens

  • Affiliations:
  • KU Leuven, Leuven, Belgium;KU Leuven, Leuven, Belgium;KU Leuven, Leuven, Belgium

  • Venue:
  • Proceedings of the 2nd ACM international workshop on Multimedia analysis for ecological data
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose an unsupervised framework for recognizing animals in videos using subtitles. In this framework, the alignment between animals and their names is performed using an Expectation Maximization algorithm which is adapted to two very different circumstances- 1) when the bounding boxes are available and 2) when the frame as a whole is used instead of bounding boxes. With the goal of maximizing precision, recall and F-measure, the experiments compare a multitude of natural language processing approaches and visual features when associating animal names in the subtitles with visual patterns. The proposed unsupervised methods obtain 83.1% F1 using bounding boxes and 65.7% F1 without bounding boxes in a fully automated pipeline.