Retrieval of snippets of web pages converted to plain text: more questions than answers

  • Authors:
  • Carlos G. Figuerola;José Luis Alonso Berrocal;Ángel F. Zazo Rodríguez;Montserrat Mateos

  • Affiliations:
  • University of Salamanca, REINA Research Group, Salamanca, Spain;University of Salamanca, REINA Research Group, Salamanca, Spain;University of Salamanca, REINA Research Group, Salamanca, Spain;University of Salamanca, REINA Research Group, Salamanca, Spain

  • Venue:
  • CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This year's WebCLEF task was to retrieve snippets and pieces from documents on various topics. The extraction and the choice of the most widely used snippets can be carried out using various methods. However, the way in which web pages are usually converted to plain text introduces a series of problems that cause inefficiency in the retrieval. Duplicate information, absolutely irrelevants snippets or even meaningless, are some of these problems. Also, it is intended in this paper to explore the real impact of the use of several languages in obtaining relevant fragments.