Media Streams: an iconic visual language for video representation
Human-computer interaction
Faceted metadata for image search and browsing
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
How do people manage their digital photographs?
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Universal Access to Mobile Computing Devices through Speech Input
Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference
Geographic location tags on digital images
MULTIMEDIA '03 Proceedings of the eleventh ACM international conference on Multimedia
Photo annotation on a camera phone
CHI '04 Extended Abstracts on Human Factors in Computing Systems
Multimodal photo annotation and retrieval on a mobile phone
MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Hi-index | 0.00 |
We present MAMI (i.e. Multimodal Automatic Mobile Indexing), a mobile-phone prototype that allows users to annotate and search for digital photos on their camera phone via speech input. MAMI is implemented as a mobile application that runs in real-time on the phone. Users can add speech annotations at the time of capturing photos or at a later time. Additional metadata is also stored with the photos, such as location, user identification, date and time of capture and image-based features. Users can search for photos in their personal repository by means of speech. MAMI does not need connectivity to a server. Hence, instead of full-fledged speech recognition, we propose using a Dynamic Time Warping-based metric to determine the distance between the speech input and all other existing speech annotations. We present our preliminary results with the MAMI prototype and outline our future directions of research, including the integration of additional metadata in the search.