Toward content-aware multimodal tagging of personal photo collections

Authors:
Paulo Barthelmess;Edward Kaiser;David R. McGee
Affiliations:
Adapx: Inc, Seattle, WA;Adapx: Inc, Seattle, WA;Adapx: Inc, Seattle, WA
Venue:
Proceedings of the 9th international conference on Multimodal interfaces
Year:
2007

Citing 6
Cited 5

Modern Information Retrieval

Modern Information Retrieval
Exploring the potentials of combining photo annotating tasks with instant messaging fun

Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia
Mediating photo collage authoring

Proceedings of the 18th annual ACM symposium on User interface software and technology
HT06, tagging paper, taxonomy, Flickr, academic article, to read

Proceedings of the seventeenth conference on Hypertext and hypermedia
Collaborative multimodal photo annotation over digital paper

Proceedings of the 8th international conference on Multimodal interfaces
Multimodal redundancy across handwriting and speech during computer mediated human-human interactions

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

Cross-domain matching for automatic tag extraction across redundant handwriting and speech events

Proceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information
Qooqle: search with speech, gesture, and social media

Proceedings of the 13th international conference on Ubiquitous computing
The picture says it all!: multimodal interactions and interaction metadata

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
ACRONYM: Context Metrics for Linking People to User-Generated Media Content

International Journal on Semantic Web & Information Systems
CueNet: a context discovery framework to tag personal photos

Proceedings of the 3rd ACM conference on International conference on multimedia retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

A growing number of tools is becoming available, that make use ofexisting tags to help organize and retrieve photos, facilitating the management and use of photo sets. The tagging on which these techniques rely remains a time consuming, labor intensive task that discourages many users. To address this problem, we aim to leverage the multimodal content of naturally occurring photo discussions among friends and families to automatically extract tags from a combination of conversational speech, handwriting, and photo content analysis. While naturally occurring discussions are rich sources of informationabout photos, methods need to be developed to reliably extract a set of discriminative tags from this noisy, unconstrained group discourse. To this end, this paper contributes ananalysis of pilot data identifying robust multimodal features examining the interplay between photo content and other modalities such as speech and handwriting. Our analysis is motivated by a search for design implications leading to the effective incorporation of automated location and person identification(e.g. based on GPS and facial recognition technologies) into a system able to extract tags from natural multimodal conversations.