Experiments with Google news for filtering newswire articles

  • Authors:
  • Arturo Montejo-Ráez;José M. Perea-Ortega;Manuel Carlos Díaz-Galiano;L. Alfonso Ureña-López

  • Affiliations:
  • Computer Science Department, University of Jaén, Jaén, Spain;Computer Science Department, University of Jaén, Jaén, Spain;Computer Science Department, University of Jaén, Jaén, Spain;Computer Science Department, University of Jaén, Jaén, Spain

  • Venue:
  • CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an approach based on the use of Google News as a source of information in order to generate a learning corpus for an information filtering task. The INFILE (INformation FILtering Evaluation) track of the CLEF (Cross-Lingual Evaluation Forum) 2009 campaign has been used as framework. The information filtering task can be seen as a document classification task, so a supervised learning scheme has been followed. Two learning corpora have been proved: one using the text of the topics as learning data to train a classifier, and another one where training data have been generated from Google News pages, using the keywords of topics as queries. Results show that the use of Google News for generating learning data does not improve the results obtained using only topic descriptions as learning corpora.