Peta-scale data warehousing at Yahoo!

  • Authors:
  • Mona Ahuja;Cheng Che Chen;Ravi Gottapu;Jörg Hallmann;Waqar Hasan;Richard Johnson;Maciek Kozyrczak;Ramesh Pabbati;Neeta Pandit;Sreenivasulu Pokuri;Krishna Uppala

  • Affiliations:
  • Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Bellevue, WA, USA;Yahoo!, Sunnyvale, CA, USA;Yahoo!, Sunnyvale, CA, USA

  • Venue:
  • Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Insights based on detailed data on consumer behavior, product performance and marketplace behavior are driving innovation and competition in the internet space. We introduce Everest, a SQL-compliant data warehousing engine, based on a column architecture that we have built and deployed at Yahoo!. In contrast to commercially available engines, this massively parallel engine, based on commodity hardware, offers scale, flexibility, specialized analytic operations, and lower administrative & hardware costs. In this paper, we describe the business motivation and the software and deployment architecture of Everest. The engine is in production at Yahoo! since 2007 and currently manages over six petabytes of data.