Document selection for tiered indexing in commerce search

Authors:
Debmalya Panigrahi;Sreenivas Gollapudi
Affiliations:
Microsoft Research, Redmond, WA, USA;Microsoft Research, Mountain View, CA, USA
Venue:
Proceedings of the sixth ACM international conference on Web search and data mining
Year:
2013

Citing 21
Cited 0

Randomized algorithms

Randomized algorithms
On the reuse of past optimal queries

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A threshold of ln n for approximating set cover

Journal of the ACM (JACM)
The budgeted maximum coverage problem

Information Processing Letters
Web caching for database applications with Oracle Web Cache

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
DBCache: middle-tier database caching for highly scalable e-business architectures

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Multi-Tier Architecture for Web Search Engines

LA-WEB '03 Proceedings of the First Conference on Latin American Web Congress
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Estimating corpus size via queries

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Load balancing and data placement for multi-tiered database systems

Data & Knowledge Engineering
The impact of caching on search engines

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
ResIn: a combination of results caching and index pruning for high-performance web search engines
Scalable query result caching for web applications

Proceedings of the VLDB Endowment
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
Efficiency trade-offs in two-tier web search systems

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Structured annotations of web queries

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Result enrichment in commerce search using browse trails

Proceedings of the fourth ACM international conference on Web search and data mining
Stochastic query covering

Proceedings of the fourth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A search engine aims to return a set of relevant documents in response to a query, while minimizing the response time. This has led to the use of a tiered index, where the search engine maintains a small cache of documents that can serve a large fraction of queries. We give a novel algorithm for the selection of documents in a tiered index for commerce search (i.e. users searching for products on the web) that effectively exploits the superior structural characteristics of commerce search queries. This is in sharp contrast to previous approaches to tiered indexing that were aimed at general web search where queries are typically unstructured. We theoretically analyze our algorithms and give performance guarantees even in worst-case scenarios. We then complement and strengthen our theoretical claims by performing exhaustive experiments on real-world commerce search data, and show that our algorithm outperforms state-of-the-art tiered indexing techniques that were developed for general web search.