LePrEF: Learn to precompute evidence fusion for efficient query evaluation

Authors:
André L. da Costa Carvalho;Cristian Rossi;Edleno S. de Moura;Altigran S. da Silva;David Fernandes
Affiliations:
Institute of Computing, Federal University of Amazonas, Manaus, AMBrazil;Institute of Computing, Federal University of Amazonas, Manaus, AMBrazil;Institute of Computing, Federal University of Amazonas, Manaus, AMBrazil;Institute of Computing, Federal University of Amazonas, Manaus, AMBrazil;Institute of Computing, Federal University of Amazonas, Manaus, AMBrazil
Venue:
Journal of the American Society for Information Science and Technology
Year:
2012

Citing 22
Cited 0

Arithmetic coding for data compression

Communications of the ACM
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
httperf—a tool for measuring web server performance

ACM SIGMETRICS Performance Evaluation Review
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Modern Information Retrieval

Modern Information Retrieval
Impact transformation: effective and efficient web retrieval

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Local versus global link information in the Web

ACM Transactions on Information Systems (TOIS)
Ranking Function Optimization for Effective Web Search by Genetic Programming: An Empirical Study

HICSS '04 Proceedings of the Proceedings of the 37th Annual Hawaii International Conference on System Sciences (HICSS'04) - Track 4 - Volume 4
An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research
Discovery of Context-Specific Ranking Functions for Effective Information Retrieval Using Genetic Programming

IEEE Transactions on Knowledge and Data Engineering
The effects of fitness functions on genetic programming-based ranking discovery for Web search: Research Articles

Journal of the American Society for Information Science and Technology
Simplified similarity scoring using term ranks

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A combined component approach for finding collection-adapted ranking functions based on genetic programming

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Listwise approach to learning to rank: theory and algorithm

Proceedings of the 25th international conference on Machine learning
An evolutionary approach for combining different sources of evidence in search engines

Information Systems
Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring

SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Learning concept importance using a weighted dependence model

Proceedings of the third ACM international conference on Web search and data mining
Early exit optimizations for additive machine learned ranking systems

Proceedings of the third ACM international conference on Web search and data mining
Caching search engine results over incremental indices

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Universal codeword sets and representations of the integers

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing. © 2012 Wiley Periodicals, Inc.