ROAR: increasing the flexibility and performance of distributed search
Proceedings of the ACM SIGCOMM 2009 conference on Data communication
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
On indexing error-tolerant set containment
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
From web data to entities and back
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
MapReduce for information retrieval evaluation: "let's quickly test this on 12 TB of data"
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Dremel: interactive analysis of web-scale datasets
Proceedings of the VLDB Endowment
Batch query processing for web search engines
Proceedings of the fourth ACM international conference on Web search and data mining
Dremel: interactive analysis of web-scale datasets
Communications of the ACM
Cost-Aware Strategies for Query Result Caching in Web Search Engines
ACM Transactions on the Web (TWEB)
Full-text indexing for optimizing selection operations in large-scale data analytics
Proceedings of the second international workshop on MapReduce and its applications
Scalable multi-dimensional user intent identification using tree structured distributions
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Timestamp-based result cache invalidation for web search engines
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Faster top-k document retrieval using block-max indexes
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Small cache, big effect: provable load balancing for randomly partitioned cluster services
Proceedings of the 2nd ACM Symposium on Cloud Computing
Efficiently encoding term co-occurrences in inverted indexes
Proceedings of the 20th ACM international conference on Information and knowledge management
SIMD-based decoding of posting lists
Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient phrase querying with flat position index
Proceedings of the 20th ACM international conference on Information and knowledge management
A five-level static cache architecture for web search engines
Information Processing and Management: an International Journal
Optimizing positional index structures for versioned document collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
To index or not to index: time-space trade-offs in search engines with positional ranking functions
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Learning to predict response times for online query scheduling
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Processing a trillion cells per mouse click
Proceedings of the VLDB Endowment
Proceedings of the WICSA/ECSA 2012 Companion Volume
A distributed index for efficient parallel top-k keyword search on massive graphs
Proceedings of the twelfth international workshop on Web information and data management
Reordering an index to speed query processing without loss of effectiveness
Proceedings of the Seventeenth Australasian Document Computing Symposium
Efficient and effective retrieval using selective pruning
Proceedings of the sixth ACM international conference on Web search and data mining
Optimizing top-k document retrieval strategies for block-max indexes
Proceedings of the sixth ACM international conference on Web search and data mining
Hybrid query scheduling for a replicated search engine
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The impact of solid state drive on search engine cache management
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
A candidate filtering mechanism for fast top-k query processing on modern cpus
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Scalability and efficiency challenges in commercial web search engines
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Permutation indexing: fast approximate retrieval from large corpora
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Load-sensitive selective pruning for distributed search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Learning to rank query suggestions for adhoc and diversity search
Information Retrieval
Hi-index | 0.02 |
Building and operating large-scale information retrieval systems used by hundreds of millions of people around the world provides a number of interesting challenges. Designing such systems requires making complex design tradeoffs in a number of dimensions, including (a) the number of user queries that must be handled per second and the response latency to these requests, (b) the number and size of various corpora that are searched, (c) the latency and frequency with which documents are updated or added to the corpora, and (d) the quality and cost of the ranking algorithms that are used for retrieval. In this talk I will discuss the evolution of Google's hardware infrastructure and information retrieval systems and some of the design challenges that arise from ever-increasing demands in all of these dimensions. I will also describe how we use various pieces of distributed systems infrastructure when building these retrieval systems. Finally, I will describe some future challenges and open research problems in this area.