Shark: fast data analysis using coarse-grained distributed memory

  • Authors:
  • Cliff Engle;Antonio Lupher;Reynold Xin;Matei Zaharia;Michael J. Franklin;Scott Shenker;Ion Stoica

  • Affiliations:
  • University of California Berkeley, Berkeley, CA, USA;University of California Berkeley, Berkeley, CA, USA;University of California Berkeley, Berkeley, CA, USA;University of California Berkeley, Berkeley, CA, USA;University of California Berkeley, Berkeley, CA, USA;University of California Berkeley, Berkeley, CA, USA;University of California Berkeley, Berkeley, CA, USA

  • Venue:
  • SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Shark is a research data analysis system built on a novel coarse-grained distributed shared-memory abstraction. Shark marries query processing with deep data analysis, providing a unified system for easy data manipulation using SQL and pushing sophisticated analysis closer to data. It scales to thousands of nodes in a fault-tolerant manner. Shark can answer queries 40X faster than Apache Hive and run machine learning programs 25X faster than MapReduce programs in Apache Hadoop on large datasets.