Peta-scale data warehousing at Yahoo!

Authors:
Mona Ahuja;Cheng Che Chen;Ravi Gottapu;Jörg Hallmann;Waqar Hasan;Richard Johnson;Maciek Kozyrczak;Ramesh Pabbati;Neeta Pandit;Sreenivasulu Pokuri;Krishna Uppala
Affiliations:
Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Sunnyvale, CA, USA;Yahoo!, Sunnyvale, CA, USA
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 7
Cited 5

Abstract machine for LDL

EDBT '90 Proceedings of the 2nd international conference on extending database technology: Advances in Database Technology
Parallel database systems: the future of high performance database systems

Communications of the ACM
Simultaneous optimization and evaluation of multiple dimensional queries

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimization of SQL Queries for Parallel Machines

Optimization of SQL Queries for Parallel Machines
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Query execution in column-oriented database systems

Query execution in column-oriented database systems
The Unreasonable Effectiveness of Data

IEEE Intelligent Systems

Update propagation in a streaming warehouse

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
ReStore: reusing results of MapReduce jobs

Proceedings of the VLDB Endowment
A sequence-oriented stream warehouse paradigm for network monitoring applications

PAM'12 Proceedings of the 13th international conference on Passive and Active Measurement
Towards benchmarking stream data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Data stream warehousing

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Insights based on detailed data on consumer behavior, product performance and marketplace behavior are driving innovation and competition in the internet space. We introduce Everest, a SQL-compliant data warehousing engine, based on a column architecture that we have built and deployed at Yahoo!. In contrast to commercially available engines, this massively parallel engine, based on commodity hardware, offers scale, flexibility, specialized analytic operations, and lower administrative & hardware costs. In this paper, we describe the business motivation and the software and deployment architecture of Everest. The engine is in production at Yahoo! since 2007 and currently manages over six petabytes of data.