The picture says it all!: multimodal interactions and interaction metadata

Authors:
Ramadevi Vennelakanti;Prasenjit Dey;Ankit Shekhawat;Phanindra Pisupati
Affiliations:
Hewlett-Packard Labs, Bangalore, India;Hewlett-Packard Labs, Bangalore, India;Hewlett-Packard Labs, Bangalore, India;Hewlett-Packard Labs, Bangalore, India
Venue:
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Year:
2011

Citing 10
Cited 2

Storytelling with digital photographs

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Requirements for photoware

CSCW '02 Proceedings of the 2002 ACM conference on Computer supported cooperative work
Augmenting Photographs with Audio

Personal and Ubiquitous Computing
Aria: An Agent for Annotating and Retrieving Images

Computer
How do people manage their digital photographs?

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
From context to content: leveraging context to infer media metadata

Proceedings of the 12th annual ACM international conference on Multimedia
Exploring the potentials of combining photo annotating tasks with instant messaging fun

Proceedings of the 3rd international conference on Mobile and ubiquitous multimedia
Audiophotography: Bringing photos to life with sounds (The Computer Supported Cooperative Work Series)

Audiophotography: Bringing photos to life with sounds (The Computer Supported Cooperative Work Series)
Toward content-aware multimodal tagging of personal photo collections

Proceedings of the 9th international conference on Multimodal interfaces
Robust user context analysis for multimodal interfaces

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces

Pixene: creating memories while sharing photos

Proceedings of the 14th ACM international conference on Multimodal interaction
Producing while consuming: social interaction around photos shared within private group

ACE'12 Proceedings of the 9th international conference on Advances in Computer Entertainment

Quantified Score

Hi-index	0.00

Visualization

Abstract

People share photographs with family and friends! This inclination to share photographs lends itself to many occasions of co-present sharing resulting in interesting interactions, discussions, and experiences among those present. These interactions, are rich in information about the context and the content of the photograph and if extracted can be used to associate metadata with the photograph. However these are rarely captured and so, are lost at the end of the co-present photo sharing session. Most current work on extracting implicit metadata focuses on Content metadata - analyzing the content in a photograph and Object metadata that is automatically generated and consists of data like GPS location, date and time etc. We address the capture of another interesting type of implicit metadata, called the "Interaction metadata", from the user's multimodal interactions with the media (here photographs) during co-present sharing. These interactions in the context of photographs contain rich information: who saw it, who said what, what was pointed at when they said it, who did they see it with for how long, how many times and so on; which if captured and analyzed can create interesting memories about the photograph. These will over time, help build stories around photographs, aid storytelling, serendipitous discovery and efficient retrieval among other experiences. Interaction metadata can also help organize photographs better by providing mechanisms for filtering based on, who viewed, most viewed, etc. Interaction metadata provides a hereto under explored implicit metadata type created from interactions with media. We designed and built a system prototype to capture and create interaction metadata. In this paper we describe the prototype and present the findings of a study we carried out to evaluate this prototype. The contribution of our work to the domain of multimodal interactions are: a method of identifying relevant speech portions in a free flowing conversation and the use of natural human interactions in the context of media to create Interaction Metadata, a novel type of implicit metadata.