MAMI: multimodal annotations on a camera phone

  • Authors:
  • Xavier Anguera;Nuria Oliver

  • Affiliations:
  • Telefónica Research, Barcelona, Spain;Telefónica Research, Barcelona, Spain

  • Venue:
  • Proceedings of the 10th international conference on Human computer interaction with mobile devices and services
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present MAMI (i.e. Multimodal Automatic Mobile Indexing), a mobile-phone prototype that allows users to annotate and search for digital photos on their camera phone via speech input. MAMI is implemented as a mobile application that runs in real-time on the phone. Users can add speech annotations at the time of capturing photos or at a later time. Additional metadata is also stored with the photos, such as location, user identification, date and time of capture and image-based features. Users can search for photos in their personal repository by means of speech. MAMI does not need connectivity to a server. Hence, instead of full-fledged speech recognition, we propose using a Dynamic Time Warping-based metric to determine the distance between the speech input and all other existing speech annotations. We present our preliminary results with the MAMI prototype and outline our future directions of research, including the integration of additional metadata in the search.