Exploiting query term correlation for list caching in web search engines

Authors:
Jiancong Tong;Gang Wang;Douglas S. Stones;Shizhao Sun;Xiaoguang Liu;Fan Zhang
Affiliations:
Nankai University, Tianjin, China;Nankai University, Tianjin, China;Monash University, Melbourne, Australia;Nankai University, Tianjin, China;Nankai University, Tianjin, China;Nankai University, Tianjin, China
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 13
Cited 0

Analysis of a very large web search engine query log

ACM SIGIR Forum
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Experiments on Adaptive Set Intersections for Text Retrieval Systems

ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Heavy-tailed distributions and multi-keyword queries

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
Characteristics of character usage in Chinese Web searching

Information Processing and Management: an International Journal
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
Admission policies for caches of search engine results

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Modeling static caching in web search engines

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
On caching search engine query results

Computer Communications
A five-level static cache architecture for web search engines

Information Processing and Management: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Caching technologies have been widely employed to boost the performance of Web search engines. Motivated by the correlation between terms in query logs from a commercial search engine, we explore the idea of a caching scheme based on pairs of terms, rather than individual terms (which is the typical approach used by search engines today). We propose an inverted list caching policy, based on the Least Recently Used method, in which the co-occurring correlation between terms in the query stream is accounted for when deciding on which terms to keep in the cache. We consider not only the term co-occurrence within the same query but also the co-occurrence between separate queries. Experimental results show that the proposed approach can improve not only the cache hit ratio but also the overall throughput of the system when compared to existing list caching algorithms.