Online Learning for Web Query Generation: Finding Documents Matching a Minority Concept on the Web

  • Authors:
  • Rayid Ghani;Rosie Jones;Dunja Mladenic

  • Affiliations:
  • -;-;-

  • Venue:
  • WI '01 Proceedings of the First Asia-Pacific Conference on Web Intelligence: Research and Development
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes an approach for learning to generate web-search queries for collecting documents matching a minority concept. As a case study we use the concept of text documents belonging to Slovenian, a minority natural language on the Web. Individual documents are automatically labeled as relevant or non-relevant using a language filter and the feedback is used to learn what query-lengths and inclusion/exclusion term-selection methods are helpful for finding previously unseen documents in the target language. Our system, CorpusBuilder, learns to select "good" query terms using a variety of term scoring methods. We present empirical results with learning methods that vary the time horizon used when learning from the results of past queries. Our approaches generalize well across several languages regardless of the initial conditions.