Seeing and reading red: hue and color-word correlation in images and attendant text on the WWW

  • Authors:
  • Shawn Newsam

  • Affiliations:
  • University of California at Merced, Merced, CA

  • Venue:
  • MDM '05 Proceedings of the 6th international workshop on Multimedia data mining: mining integrated media and complex data
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work represents an initial investigation into determining whether correlations actually exist between metadata and content descriptors in multimedia datasets. We provide a quantitative method for evaluating whether the hue of images on the WWW is correlated with the occurrence of color-words in metadata such as URLs, image names, and attendant text. It turns out that such a correlation does exist: the likelihood that a particular color appears in an image whose URL, name, and/or attendant text contains the corresponding color-word is generally at least twice the likelihood that the color appears in a randomly chosen image on the WWW. While this finding might not be significant in and of itself, it represents an initial step towards quantitatively establishing that other, perhaps more useful correlations exist. These correlations form the basis for exciting novel approaches that leverage semi-supervised datasets, such as the WWW, to overcome the semantic gap that has hampered progress in multimedia information retrieval for some time now.