Image indexing based on web page segmentation and clustering

  • Authors:
  • Georgina Tryfou;Nicolas Tsapatsoulis

  • Affiliations:
  • Technological University of Cyprus, Department of Communication and Internet Studies, Limassol, Cyprus;Technological University of Cyprus, Department of Communication and Internet Studies, Limassol, Cyprus

  • Venue:
  • ACA'12 Proceedings of the 11th international conference on Applications of Electrical and Computer Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Thousands of images are nowadays available on the web. These images are accompanied by a wide range of textual descriptors, such as image file names, anchor texts and, of course, surrounding text. Existing systems that attempt to mine information for images using surrounding text suffer from several problems, such as the inability to correctly assign all relevant text to an image and discard the irrelevant. In this paper, we propose a novel method for indexing web images which is based on textual descriptors. The web document is segmented into visual blocks of text and then each block of text is assigned to the closet image. The text extraction is improved by assigning the text to an image following the intuitive understanding of how close two visual blocks are. The evaluation confirms the validity of the proposed method and demonstrates its possible extensions.