Maguro, a system for indexing and searching over very large text collections

Authors:
Knut Magne Risvik;Trishul Chilimbi;Henry Tan;Karthik Kalyanaraman;Chris Anderson
Affiliations:
Microsoft, Oslo, Norway;Microsoft, Redmond, USA;Microsoft, Redmond, USA;Microsoft, Redmond, USA;Microsoft, Redmond, USA
Venue:
Proceedings of the sixth ACM international conference on Web search and data mining
Year:
2013

Citing 17
Cited 0

Performance issues in distributed shared-nothing information-retrieval systems

Information Processing and Management: an International Journal
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
Building a distributed full-text index for the web

ACM Transactions on Information Systems (TOIS)
Modern Information Retrieval

Modern Information Retrieval
Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Efficient query evaluation using a two-level retrieval process

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Load balancing for term-distributed parallel retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
The SMART way to migrate replicated stateful services

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Autopilot: automatic data center management

ACM SIGOPS Operating Systems Review - Systems work at Microsoft Research
A pipelined architecture for distributed text query evaluation

Information Retrieval
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Introduction to Information Retrieval

Introduction to Information Retrieval
Towards a next generation data center architecture: scalability and commoditization

Proceedings of the ACM workshop on Programmable routers for extensible services of tomorrow
Scheduling Intersection Queries in Term Partitioned Inverted Files

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Maguro is a system for efficiently searching very large collections of text content of up to 1 trillion documents at low cost. Search engines span across content that is very dynamic and highly augmented with metadata to the tail content of the web. A long tail distribution of content calls for different trade-offs in the design space for good efficiency across the entire index range. Maguro is designed for the long tail of content with less dynamics and less metadata, but very good cost efficiency. Maguro is part of the serving stack in Bing and allows us to scale the index significantly better.