Optimized query execution in large search engines with global page ordering

Authors:
Xiaohui Long;Torsten Suel
Affiliations:
CIS Department, Polytechnic University, Brooklyn, NY;CIS Department, Polytechnic University, Brooklyn, NY
Venue:
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Year:
2003

Citing 34
Cited 48

Implementations of partial document ranking using inverted files

Information Processing and Management: an International Journal
Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Query evaluation: strategies and optimizations

Information Processing and Management: an International Journal
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Combining fuzzy information from multiple systems (extended abstract)

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Optimization of inverted vector searches

SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Compressed inverted files with reduced decoding overheads

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic resource compilation by analyzing hyperlink structure and associated text

WWW7 Proceedings of the seventh international conference on World Wide Web 7
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Evaluating the performance of distributed architectures for information retrieval using a variety of workloads

ACM Transactions on Information Systems (TOIS)
The stochastic approach for link-structure analysis (SALSA) and the TKC effect

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Breadth-first crawling yields high-quality pages

Proceedings of the 10th international conference on World Wide Web
Building a distributed full-text index for the Web

Proceedings of the 10th international conference on World Wide Web
Finding authorities and hubs from link structures on the World Wide Web

Proceedings of the 10th international conference on World Wide Web
Optimal aggregation algorithms for middleware

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Vector-space ranking with effective early termination

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Static index pruning for information retrieval systems

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Modern Information Retrieval

Modern Information Retrieval
Combining fuzzy information: an overview

ACM SIGMOD Record
Lessons from Giant-Scale Services

IEEE Internet Computing
Performance of Inverted Indices in Distributed Text Document Retrieval Systems

PDIS '93 Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems
Optimizing Multi-Feature Queries for Image Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Using Fagin's Algorithm for Merging Ranked Results in Multimedia Middleware

COOPIS '99 Proceedings of the Fourth IECIS International Conference on Cooperative Information Systems
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Query Processing Issues in Image(Multimedia) Databases

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Towards Efficient Multi-Feature Queries in Heterogeneous Environments

ITCC '01 Proceedings of the International Conference on Information Technology: Coding and Computing
Evaluating Top-k Queries over Web-Accessible Databases

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Design and Implementation of a High-Performance Distributed Web Crawler

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Scalable distributed architectures for information retrieval

Scalable distributed architectures for information retrieval
Optimizing result prefetching in web search engines with segmented indices

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Efficient Inverted Lists and Query Algorithms for Structured Value Ranking in Update-Intensive Relational Databases

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Efficient and self-tuning incremental query expansion for top-k query processing

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
An efficient and versatile query engine for TopX search

VLDB '05 Proceedings of the 31st international conference on Very large data bases
KLEE: a framework for distributed top-k query algorithms

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Static score bucketing in inverted indexes

Proceedings of the 14th ACM international conference on Information and knowledge management
Performance of query processing implementations in ranking-based text retrieval systems using inverted indices

Information Processing and Management: an International Journal
Efficient query processing in geographic web search engines

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Architecture of a grid-enabled Web search engine

Information Processing and Management: an International Journal
Hybrid global-local indexing for effcient peer-to-peer information retrieval

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Pruning policies for two-tiered inverted index with correctness guarantee

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficiency-quality tradeoffs for vector score aggregation

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Top-k query evaluation with probabilistic guarantees

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
High performance index build algorithms for intranet search engines

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Effective top-k computation in retrieving structured documents with term-proximity support

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Best position algorithms for top-k queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Incremental cluster-based retrieval using compressed cluster-skipping inverted files

ACM Transactions on Information Systems (TOIS)
ResIn: a combination of results caching and index pruning for high-performance web search engines
Relaxation in text search using taxonomies

Proceedings of the VLDB Endowment
Dynamic faceted search for discovery-driven analysis

Proceedings of the 17th ACM conference on Information and knowledge management
MedSearch: a specialized search engine for medical information retrieval

Proceedings of the 17th ACM conference on Information and knowledge management
Can phrase indexing help to process non-phrase queries?

Proceedings of the 17th ACM conference on Information and knowledge management
Top-k aggregation using intersections of ranked inputs

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Inverted index compression and query processing with optimized document ordering

Proceedings of the 18th international conference on World wide web
A search-based method for forecasting ad impression in contextual advertising

Proceedings of the 18th international conference on World wide web
Nullification test collections for web spam and SEO

Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web
Effective top-k computation with term-proximity support

Information Processing and Management: an International Journal
Distributed top-k aggregation queries at large

Distributed and Parallel Databases
Leveraging a scalable row store to build a distributed text index

Proceedings of the first international workshop on Cloud data management
Probabilistic static pruning of inverted files

ACM Transactions on Information Systems (TOIS)
Revisiting globally sorted indexes for efficient document retrieval

Proceedings of the third ACM international conference on Web search and data mining
Beyond pages: supporting efficient, scalable entity search with dual-inversion index

Proceedings of the 13th International Conference on Extending Database Technology
Efficient term proximity search with term-pair indexes

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A web search engine model based on index-query bit-level compression

Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications
Batch query processing for web search engines

Proceedings of the fourth ACM international conference on Web search and data mining
Faster top-k document retrieval using block-max indexes

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Efficiently encoding term co-occurrences in inverted indexes

Proceedings of the 20th ACM international conference on Information and knowledge management
Indexing shared content in information retrieval systems

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Optimized top-k processing with global page scores on block-max indexes

Proceedings of the fifth ACM international conference on Web search and data mining
Index ordering by query-independent measures

Information Processing and Management: an International Journal
When big data leads to lost data

Proceedings of the 5th Ph.D. workshop on Information and knowledge
Processing continuous text queries featuring non-homogeneous scoring functions

Proceedings of the 21st ACM international conference on Information and knowledge management
Reordering an index to speed query processing without loss of effectiveness

Proceedings of the Seventeenth Australasian Document Computing Symposium
RASIM: a rank-aware separate index method for answering top-k spatial keyword queries

World Wide Web
Optimizing top-k document retrieval strategies for block-max indexes

Proceedings of the sixth ACM international conference on Web search and data mining
Development of a Novel Compressed Index-Query Web Search Engine Model

International Journal of Information Technology and Web Engineering
Efficient parallel block-max WAND algorithm

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, IR and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists. Over the last few years, search engines have incorporated new types of ranking techniques that exploit aspects such as the hyperlink structure of the web or the popularity of a page to obtain improved results. We focus on the question of how such techniques can be efficiently integrated into query processing. In particular, we study pruning techniques for query execution in large engines in the case where we have a global ranking of pages, as provided by Pagerank or any other method, in addition to the standard term-based approach. We describe pruning schemes for this case and evaluate their efficiency on an experimental cluster-based search engine with million web pages. Our results show that there is significant potential benefit in such techniques.