Combining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web

  • Authors:
  • Marco La Cascia;Sarathendu Sethi;Stan Sclaroff

  • Affiliations:
  • -;-;-

  • Venue:
  • Combining Textual and Visual Cues for Content-based Image Retrieval on the World Wide Web
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract Some WWW image engines allow the user to form a query in terms of text keywords. To build the image index, keywords are extracted heuristically from HTML documents containing each image, and/or from the image URL and file headers. Unfortunately, text-based image engines have merely retro-fitted standard SQL database query methods, and it is difficult to include images cues within such a framework. On the other hand, visual statistics ({\em e.g.}, color histograms) are often insufficient for helping users find desired images in a vast WWW index. By truly unifying textual and visual statistics, one would expect to get better results than either used separately. In this paper, we propose an approach that allows the combination of visual statistics with textual statistics in the vector space representation commonly used in query by image content systems. Text statistics are captured in vector form using latent semantic indexing (LSI). The LSI index for an HTML document is then associated with each of the images contained therein. Visual statistics ({\em e.g.}, color, orientedness) are also computed for each image. The LSI and visual statistic vectors are then combined into a single index vector that can be used for content-based search of the resulting image database. By using an integrated approach, we are able to take advantage of possible statistical couplings between the topic of the document (latent semantic content) and the contents of images (visual statistics). This allows improved performance in conducting content-based search. This approach has been implemented in a WWW image search engine prototype.