Modern Information Retrieval
Exploring the potentials of combining photo annotating tasks with instant messaging fun
Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia
Mediating photo collage authoring
Proceedings of the 18th annual ACM symposium on User interface software and technology
HT06, tagging paper, taxonomy, Flickr, academic article, to read
Proceedings of the seventeenth conference on Hypertext and hypermedia
Collaborative multimodal photo annotation over digital paper
Proceedings of the 8th international conference on Multimodal interfaces
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Cross-domain matching for automatic tag extraction across redundant handwriting and speech events
Proceedings of the 2007 workshop on Tagging, mining and retrieval of human related activity information
Qooqle: search with speech, gesture, and social media
Proceedings of the 13th international conference on Ubiquitous computing
The picture says it all!: multimodal interactions and interaction metadata
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
ACRONYM: Context Metrics for Linking People to User-Generated Media Content
International Journal on Semantic Web & Information Systems
CueNet: a context discovery framework to tag personal photos
Proceedings of the 3rd ACM conference on International conference on multimedia retrieval
Hi-index | 0.00 |
A growing number of tools is becoming available, that make use ofexisting tags to help organize and retrieve photos, facilitating the management and use of photo sets. The tagging on which these techniques rely remains a time consuming, labor intensive task that discourages many users. To address this problem, we aim to leverage the multimodal content of naturally occurring photo discussions among friends and families to automatically extract tags from a combination of conversational speech, handwriting, and photo content analysis. While naturally occurring discussions are rich sources of informationabout photos, methods need to be developed to reliably extract a set of discriminative tags from this noisy, unconstrained group discourse. To this end, this paper contributes ananalysis of pilot data identifying robust multimodal features examining the interplay between photo content and other modalities such as speech and handwriting. Our analysis is motivated by a search for design implications leading to the effective incorporation of automated location and person identification(e.g. based on GPS and facial recognition technologies) into a system able to extract tags from natural multimodal conversations.