Experiments with Google news for filtering newswire articles

Authors:
Arturo Montejo-Ráez;José M. Perea-Ortega;Manuel Carlos Díaz-Galiano;L. Alfonso Ureña-López
Affiliations:
Computer Science Department, University of Jaén, Jaén, Spain;Computer Science Department, University of Jaén, Jaén, Spain;Computer Science Department, University of Jaén, Jaén, Spain;Computer Science Department, University of Jaén, Jaén, Spain
Venue:
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Year:
2009

Citing 5
Cited 0

Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Classifying biological articles using web resources

Proceedings of the 2004 ACM symposium on Applied computing
Using Google distance to weight approximate ontology matches

Proceedings of the 16th international conference on World Wide Web
Using an information retrieval system for video classification

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access
Overview of CLEF 2008 INFILE pilot track

CLEF'08 Proceedings of the 9th Cross-language evaluation forum conference on Evaluating systems for multilingual and multimodal information access

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an approach based on the use of Google News as a source of information in order to generate a learning corpus for an information filtering task. The INFILE (INformation FILtering Evaluation) track of the CLEF (Cross-Lingual Evaluation Forum) 2009 campaign has been used as framework. The information filtering task can be seen as a document classification task, so a supervised learning scheme has been followed. Two learning corpora have been proved: one using the text of the topics as learning data to train a classifier, and another one where training data have been generated from Google News pages, using the keywords of topics as queries. Results show that the use of Google News for generating learning data does not improve the results obtained using only topic descriptions as learning corpora.