Navigating big data with high-throughput, energy-efficient data partitioning

  • Authors:
  • Lisa Wu;Raymond J. Barker;Martha A. Kim;Kenneth A. Ross

  • Affiliations:
  • Columbia University, New York;Columbia University, New York;Columbia University, New York;Columbia University, New York

  • Venue:
  • Proceedings of the 40th Annual International Symposium on Computer Architecture
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve the throughput and energy efficiency of large-scale data processing. In particular, data partitioning is a critical operation for manipulating large data sets. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries. To accelerate partitioning, this paper describes a hardware accelerator for range partitioning, or HARP, and a hardware-software data streaming framework. The streaming framework offers a seamless execution environment for streaming accelerators such as HARP. Together, HARP and the streaming framework provide an order of magnitude improvement in partitioning performance and energy. A detailed analysis of a 32nm physical design shows 7.8 times the throughput of a highly optimized and optimistic software implementation, while consuming just 6.9% of the area and 4.3% of the power of a single Xeon core in the same technology generation.