Collaborative multimodal photo annotation over digital paper

  • Authors:
  • Paulo Barthelmess;Edward Kaiser;Xiao Huang;David McGee;Philip Cohen

  • Affiliations:
  • Natural Interaction Systems, Seattle, WA;Natural Interaction Systems, Seattle, WA;Natural Interaction Systems, Seattle, WA;Natural Interaction Systems, Seattle, WA;Natural Interaction Systems, Seattle, WA

  • Venue:
  • Proceedings of the 8th international conference on Multimodal interfaces
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The availability of metadata annotations over media content such as photos is known to enhance retrieval and organization, particularly for large data sets. The greatest challenge for obtaining annotations remains getting users to perform the large amount of tedious manual work that is required.In this paper we introduce an approach for semi-automated labeling based on extraction of metadata from naturally occurring conversations of groups of people discussing pictures among themselves.As the burden for structuring and extracting metadata is shifted from users to the system, new recognition challenges arise. We explore how multimodal language can help in 1) detecting a concise set of meaningful labels to be associated with each photo, 2) achieving robust recognition of these key semantic terms, and 3) facilitating label propagation via multimodal shortcuts. Analysis of the data of a preliminary pilot collection suggests that handwritten labels may be highly indicative of the semantics of each photo, as indicated by the correlation of handwritten terms with high frequency spoken ones. We point to initial directions exploring a multimodal fusion technique to recover robust spelling and pronunciation of these high-value terms from redundant speech and handwriting.