Document Categorization and Query Generation on the World Wide WebUsing WebACE
Artificial Intelligence Review - Special issue on data mining on the Internet
Learning a monolingual language model from a multilingual text database
Proceedings of the ninth international conference on Information and knowledge management
Using Reinforcement Learning to Spider the Web Efficiently
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
On-line Algorithms in Machine Learning
Developments from a June 1996 seminar on Online algorithms: the state of the art
Improving Category Specific Web Search by Learning Query Modifications
SAINT '01 Proceedings of the 2001 Symposium on Applications and the Internet (SAINT 2001)
WebSail: From On-line Learning to Web Search
WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 1 - Volume 1
Using web helper agent profiles in query generation
AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
ACM Transactions on Internet Technology (TOIT)
Hi-index | 0.00 |
This paper describes an approach for learning to generate web-search queries for collecting documents matching a minority concept. As a case study we use the concept of text documents belonging to Slovenian, a minority natural language on the Web. Individual documents are automatically labeled as relevant or non-relevant using a language filter and the feedback is used to learn what query-lengths and inclusion/exclusion term-selection methods are helpful for finding previously unseen documents in the target language. Our system, CorpusBuilder, learns to select "good" query terms using a variety of term scoring methods. We present empirical results with learning methods that vary the time horizon used when learning from the results of past queries. Our approaches generalize well across several languages regardless of the initial conditions.