Improving web pages retrieval using combined fields

  • Authors:
  • Carlos G. Figuerola;José L. Alonso Berrocal;Angel F. Zazo Rodríguez;Emilio Rodríguez

  • Affiliations:
  • REINA Research Group, University of Salamanca, Spain;REINA Research Group, University of Salamanca, Spain;REINA Research Group, University of Salamanca, Spain;REINA Research Group, University of Salamanca, Spain

  • Venue:
  • CLEF'06 Proceedings of the 7th international conference on Cross-Language Evaluation Forum: evaluation of multilingual and multi-modal information retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article describes the participation of the REINA Research Group of the University of Salamanca in WebCLEF 2006. This year we participated in the Monolingual Mixed Task in Spanish. The entire EuroGOV collection was processed to select all the pages in Spanish. All the pages with domain .es were also pre-selected. Our objective this year was to try pre-retrieval techniques of combining information fields or elements from web pages as well as the retrieval capability of these fields. In vector-based retrieval systems, the combining of terms coming from different sources can be achieved by operating on the frequency of the terms in the document using a weight scheme of tf×idf. The BODY field is, of course, the most useful from the retrieval perspective, but the text of the backlinks brings considerable improvement. META fields or tags, however, contribute little to retrieval improvement.