Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Probabilistic and genetic algorithms in document retrieval
Communications of the ACM
Genetic programming: on the programming of computers by means of natural selection
Genetic programming: on the programming of computers by means of natural selection
OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
On relevance weights with little relevance information
Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
A theory of term weighting based on exploratory data analysis
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Crossover improvement for the genetic algorithm in information retrieval
Information Processing and Management: an International Journal
Applying genetic algorithms to query optimization in document retrieval
Information Processing and Management: an International Journal
A vector space model for automatic indexing
Communications of the ACM
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Information Retrieval
Reexamining tf.idf based information retrieval with genetic programming
SAICSIT '02 Proceedings of the 2002 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Query Optimization in Information Retrieval Using Genetic Algorithms
Proceedings of the 5th International Conference on Genetic Algorithms
An artificial intelligence approach to information retrieval (abstract only)
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
A generic ranking function discovery framework by genetic programming for information retrieval
Information Processing and Management: an International Journal
Term-Weighting in Information Retrieval using Genetic Programming: A three stage process
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A vertical search engine based on visual and textual features
Edutainment'10 Proceedings of the Entertainment for education, and 5th international conference on E-learning and games
Combining pre-retrieval query quality predictors using genetic programming
Applied Intelligence
Hi-index | 0.00 |
Term-weighting schemes are vital to the performance of Information Retrieval models that use term frequency characteristics to determine the relevance of a document. The vector space model is one such model in which the weights assigned to the document terms are of crucial importance to the accuracy of the retrieval system. This paper describes a genetic programming framework used to automatically determine term-weighting schemes that achieve a high average precision. These schemes are tested on standard test collections and are shown to perform as well as, and often better than, the modern BM25 weighting scheme. We present an analysis of the schemes evolved to explain the increase in performance. Furthermore, we show that the global (collection wide) part of the evolved weighting schemes also increases average precision over idf on larger TREC data. These global weighting schemes are shown to adhere to Luhn's resolving power as middle frequency terms are assigned the highest weight. However, the complete weighting schemes evolved on small collections do not perform as well on large collections. We conclude that in order to evolve improved local (within-document) weighting schemes it is necessary to evolve these on large collections