Text versus speech: a comparison of tagging input modalities for camera phones

  • Authors:
  • Mauro Cherubini;Xavier Anguera;Nuria Oliver;Rodrigo de Oliveira

  • Affiliations:
  • Telefónica Research, Barcelona, Spain;Telefónica Research, Barcelona, Spain;Telefónica Research, Barcelona, Spain;Telefónica Research, Barcelona, Spain

  • Venue:
  • Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speech and typed text are two common input modalities for mobile phones. However, little research has compared them in their ability to support annotation and retrieval of digital pictures on mobile devices. In this paper, we report the results of a month-long field study in which participants took pictures with their camera phones and had the choice of adding annotations using speech, typed text, or both. Subsequently, the same subjects participated in a controlled experiment where they were asked to retrieve images based on annotations as well as retrieve annotations based on images in order to study the ability of each modality to effectively support users' recall of the previously captured pictures. Results demonstrate that each modality has advantages and shortcomings for the production of tags and retrieval of pictures. Several guidelines are suggested when designing tagging applications for portable devices.