Reordering an index to speed query processing without loss of effectiveness

Authors:
David Hawking;Timothy Jones
Affiliations:
Funnelback Pty Ltd., Australia, and Australian National University;Funnelback Pty Ltd., Australia
Venue:
Proceedings of the Seventeenth Australasian Document Computing Symposium
Year:
2012

Citing 20
Cited 1

Query evaluation: strategies and optimizations

Information Processing and Management: an International Journal
Compression of inverted indexes For fast query evaluation

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Local versus global link information in the Web

ACM Transactions on Information Systems (TOIS)
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Beyond PageRank: machine learning for static ranking

Proceedings of the 15th international conference on World Wide Web
Efficient document retrieval in main memory

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Optimized query execution in large search engines with global page ordering

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
BrowseRank: letting web users vote for page importance

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Challenges in building large-scale information retrieval systems: invited talk

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
Expected reciprocal rank for graded relevance

Proceedings of the 18th ACM conference on Information and knowledge management
Sorting out the document identifier assignment problem

ECIR'07 Proceedings of the 29th European conference on IR research
How good is a span of terms?: exploiting proximity to improve web retrieval

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Quality-biased ranking of web documents

Proceedings of the fourth ACM international conference on Web search and data mining
Learning to rank with multiple objective functions

Proceedings of the 20th international conference on World wide web
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficient and effective spam filtering and re-ranking for large web datasets

Information Retrieval
Efficient phrase querying with flat position index

Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient query evaluation through access-reordering

AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Efficient in-memory top-k document retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval

The seventeenth australasian document computing symposium

ACM SIGIR Forum

Quantified Score

Hi-index	0.00

Visualization

Abstract

Following Long and Suel, we empirically investigate the importance of document order in search engines which rank documents using a combination of dynamic (query-dependent) and static (query-independent) scores, and use document-at-a-time (DAAT) processing. When inverted file postings are in collection order, assigning document numbers in order of descending static score supports lossless early termination while maintaining good compression. Since static scores may not be available until all documents have been gathered and indexed, we build a tool for reordering an existing index and show that it operates in less than 20% of the original indexing time. We note that this additional cost is easily recouped by savings at query processing time. We compare best early-termination points for several different index orders on three enterprise search collections (a whole-of-government index with two very different query sets, and a collection from a UK university). We also present results for the same orders for ClueWeb09-CatB. Our evaluation focuses on finding results likely to be clicked on by users of Web or website search engines --- Nav and Key results in the TREC 2011 Web Track judging scheme. The orderings tested are Original, Reverse, Random, and QIE (descending order of static score). For three enterprise search test sets we find that QIE order can achieve close-to-maximal search effectiveness with much lower computational cost than for other orderings. Additionally, reordering has negligible impact on compressed index size for indexes that contain position information. Our results for an artificial query set against the TREC ClueWeb09 Category B collection are much more equivocal and we canvass possible explanations for future investigation.