Web image indexing by using associated texts

  • Authors:
  • Zhiguo Gong;Leong Hou U.;Chan Wa Cheang

  • Affiliations:
  • Faculty of Science and Technology, University of Macau, Macao, P.R. China;Faculty of Science and Technology, University of Macau, Macao, P.R. China;Faculty of Science and Technology, University of Macau, Macao, P.R. China

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to index Web images, the whole associated texts are partitioned into a sequence of text blocks, then the local relevance of a term to the corresponding image is calculated with respect to both its local occurrence in the block and the distance of the block to the image. Thus, the overall relevance of a term is determined as the sum of all its local weight values multiplied by the corresponding distance factors of the text blocks. In the present approach, the associated text of a Web image is firstly partitioned into three parts, including a page-oriented text (TM), a link-oriented text (LT), and a caption-oriented text (BT). Since the big size and semantic divergence, the caption-oriented text is further partitioned into finer blocks based on the tree structure of the tag elements within the BT text. During the processing, all heading nodes are pulled up in order to correlate with their semantic scopes, and a collapse algorithm is also exploited to remove the empty blocks. In our system, the relevant factors of the text blocks are determined by using a greedy Two-Way-Merging algorithm.