Using structural information to improve search in Web collections

  • Authors:
  • Edleno S. de Moura;David Fernandes;Berthier Ribeiro-Neto;Altigran S. da Silva;Marcos André Gonçalves

  • Affiliations:
  • Department of Computer Science, Federal University of Amazonas, Manaus, AM, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil;Department of Computer Science, Federal University of Amazonas, Manaus, AM, Brazil;Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, MG, Brazil

  • Venue:
  • Journal of the American Society for Information Science and Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated. © 2010 Wiley Periodicals, Inc.