How best to build web-scale data managers?

Authors:
Philip A. Bernstein;Daniel J. Abadi;Michael J. Cafarella;Joseph M. Hellerstein;Donald Kossmann;Samuel Madden
Affiliations:
Microsoft;Yale;U. of Washington;U.C. Berkeley;ETH Züric;M.I.T.
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 5
Cited 0

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Dynamo: amazon's highly available key-value store

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Bigtable: A Distributed Storage System for Structured Data

ACM Transactions on Computer Systems (TOCS)
PNUTS: Yahoo!'s hosted data serving platform

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many of the largest database-driven web sites use custom web-scale data managers (WDMs). On the surface, these WDMs are being applied to problems that are well-suited for relational database systems. Some examples are the following: • Map-Reduce [5], Hadoop [7], and Dryad [9] are used to process queries on large data sets using sequential scan and aggregation. Hive [8] is a data warehouse built on Hadoop. • Google's Bigtable [3] is used to store a replicated table of rows of semi-structured data. • Amazon's Dynamo [6] is used to store partitioned, replicated databases of key-value pairs. Cassandra [2] is similar. • Object caching systems are used instead of a persistent store, such as memcached [10], Oracle's Coherence, and Microsoft's Velocity project.