Text versus speech: a comparison of tagging input modalities for camera phones

Authors:
Mauro Cherubini;Xavier Anguera;Nuria Oliver;Rodrigo de Oliveira
Affiliations:
Telefónica Research, Barcelona, Spain;Telefónica Research, Barcelona, Spain;Telefónica Research, Barcelona, Spain;Telefónica Research, Barcelona, Spain
Venue:
Proceedings of the 11th International Conference on Human-Computer Interaction with Mobile Devices and Services
Year:
2009

Citing 17
Cited 1

A comparison of speech and typed input

HLT '90 Proceedings of the workshop on Speech and Natural Language
Open-vocabulary speech indexing for voice and video mail retrieval

MULTIMEDIA '96 Proceedings of the fourth ACM international conference on Multimedia
FotoFile: a consumer multimedia organization and retrieval system

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Multimedia indexing and retrieval

ACM SIGIR Forum
Managing photos with AT&T Shoebox (demonstration session)

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Using event segmentation to improve indexing of consumer photographs

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
SmartAlbum: a multi-modal photo annotation system

Proceedings of the tenth ACM international conference on Multimedia
How do people manage their digital photographs?

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A Method for Photograph Indexing Using Speech Annotation

PCM '01 Proceedings of the Second IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Metadata creation system for mobile images

Proceedings of the 2nd international conference on Mobile systems, applications, and services
The Ubiquitous Camera: An In-Depth Study of Camera Phone Use

IEEE Pervasive Computing
Mode preference in a simple data-retrieval task

HLT '93 Proceedings of the workshop on Human Language Technology
Why we tag: motivations for annotation in mobile and online media

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Tlk or txt? Using voice input for SMS composition

Personal and Ubiquitous Computing
Search Vox: leveraging multimodal refinement and partial knowledge for mobile voice search

Proceedings of the 21st annual ACM symposium on User interface software and technology
Multimodal photo annotation and retrieval on a mobile phone

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
A Study in Efficiency and Modality Usage in Multimodal Form Filling Systems

IEEE Transactions on Audio, Speech, and Language Processing

A comparison of speech and GUI input for navigation in complex visualizations on mobile devices

Proceedings of the 12th international conference on Human computer interaction with mobile devices and services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speech and typed text are two common input modalities for mobile phones. However, little research has compared them in their ability to support annotation and retrieval of digital pictures on mobile devices. In this paper, we report the results of a month-long field study in which participants took pictures with their camera phones and had the choice of adding annotations using speech, typed text, or both. Subsequently, the same subjects participated in a controlled experiment where they were asked to retrieve images based on annotations as well as retrieve annotations based on images in order to study the ability of each modality to effectively support users' recall of the previously captured pictures. Results demonstrate that each modality has advantages and shortcomings for the production of tags and retrieval of pictures. Several guidelines are suggested when designing tagging applications for portable devices.