Workload diversity and dynamics in big data analytics: implications to system designers

  • Authors:
  • Jichuan Chang;Kevin T. Lim;John Byrne;Laura Ramirez;Parthasarathy Ranganathan

  • Affiliations:
  • Hewlett Packard Labs, Palo Alto;Hewlett Packard Labs, Palo Alto;Hewlett Packard Labs, Palo Alto;Hewlett Packard Labs, Palo Alto;Hewlett Packard Labs, Palo Alto

  • Venue:
  • Proceedings of the 2nd Workshop on Architectures and Systems for Big Data
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The emergence of big data analytics and the need for cost/energy efficient IT infrastructure motivate a new focus on data-centric designs. In this paper, we aim to better understand the design implications of data analytics systems by quantifying workload requirements and runtime dynamics. We examine four workloads representing big data analytics trends for fast decisions, total integration, deep analysis and fresh insights: an archive store, a columnar database enhanced with table compression, an analytics engine with distributed R, and a transaction/analytics hybrid system. These appliations demonstrate diverse resource requirements both within and across workloads as well as load imbalance due to data skew. Our observations suggest several directions to design balanced data analytics systems, including tight integration of heterogeneous, active data stores, support for efficient communication and data-centric load balancing.