Measurement Techniques and Caching Effects

Authors:
Stefan Pohl;Alistair Moffat
Affiliations:
NICTA Victoria Research Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia 3010;NICTA Victoria Research Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia 3010
Venue:
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Year:
2009

Citing 5
Cited 0

Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient document retrieval in main memory

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Overall query execution time consists of the time spent transferring data from disk to memory, and the time spent performing actual computation. In any measurement of overall time on a given hardware configuration, the two separate costs are aggregated. This makes it hard to reproduce results and to infer which of the two costs is actually affected by modifications proposed by researchers. In this paper we show that repeated submissions of the same query provides a means to estimate the computational fraction of overall query execution time. The advantage of separate measurements is exemplified for a particular optimization that is, as it turns out, reducing computational costs only. Finally, by exchange of repeated query terms with surrogates that have similar document-frequency, we are able to measure the natural caching effects that arise as a consequence of term repetitions in query logs.