Navigating big data with high-throughput, energy-efficient data partitioning

Authors:
Lisa Wu;Raymond J. Barker;Martha A. Kim;Kenneth A. Ross
Affiliations:
Columbia University, New York;Columbia University, New York;Columbia University, New York;Columbia University, New York
Venue:
Proceedings of the 40th Annual International Symposium on Computer Architecture
Year:
2013

Citing 40
Cited 4

PipeRench: a co/processor for streaming multimedia acceleration

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementing database operations using SIMD instructions

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
B-Tree Indexes and CPU Caches

Proceedings of the 17th International Conference on Data Engineering
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Reconfigurable Streaming Vector Processor (RSVPTM)

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Memory Controller Optimizations for Web Servers

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A New Task Model for Streaming Applications and Its Schedulability Analysis

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A study of performance impact of memory controller features in multi-processor server environment

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Distributed Stream Management using Utility-Driven Self-Adaptive Middleware

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Efficient relational database management using graphics processors

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Partitioned optimization of complex queries

Information Systems
Effective Management of DRAM Bandwidth in Multicore Processors

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A Burst Scheduling Access Reordering Mechanism

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Data partitioning on chip multiprocessors

Proceedings of the 4th international workshop on Data management on new hardware
Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Optimal splitters for database partitioning with size bounds

Proceedings of the 12th International Conference on Database Theory
k-ary search on modern processors

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
Intra-Socket and Inter-Socket Communication in Multi-core Systems

IEEE Computer Architecture Letters
Server Engineering Insights for Large-Scale Online Services

IEEE Micro
Complex event detection at wire speed with FPGAs

Proceedings of the VLDB Endowment
S4: Distributed Stream Computing Platform

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
ReMAP: A Reconfigurable Heterogeneous Multicore Architecture

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Impact of recent hardware and software trends on high performance transaction processing and analytics

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
MemScale: active low-power modes for main memory

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Design and evaluation of main memory hash join algorithms for multi-core CPUs

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Scalable aggregation on multicore processors

Proceedings of the Seventh International Workshop on Data Management on New Hardware
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
Dynamically Specialized Datapaths for energy efficient computing

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Fast updates on read-optimized databases using multi-core CPUs

Proceedings of the VLDB Endowment
Toward Dark Silicon in Servers

IEEE Micro
Parallel application memory scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Virtualizing stream processing

Middleware'11 Proceedings of the 12th ACM/IFIP/USENIX international conference on Middleware
Parabix: Boosting the efficiency of text processing on commodity processors

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Accelerating business analytics applications

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Towards energy-proportional datacenter memory with mobile DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture

Large-reach memory management unit caches

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Meet the walkers: accelerating index traversals for in-memory databases

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Q100: the architecture and design of a database processing unit

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The global pool of data is growing at 2.5 quintillion bytes per day, with 90% of it produced in the last two years alone [24]. There is no doubt the era of big data has arrived. This paper explores targeted deployment of hardware accelerators to improve the throughput and energy efficiency of large-scale data processing. In particular, data partitioning is a critical operation for manipulating large data sets. It is often the limiting factor in database performance and represents a significant fraction of the overall runtime of large data queries. To accelerate partitioning, this paper describes a hardware accelerator for range partitioning, or HARP, and a hardware-software data streaming framework. The streaming framework offers a seamless execution environment for streaming accelerators such as HARP. Together, HARP and the streaming framework provide an order of magnitude improvement in partitioning performance and energy. A detailed analysis of a 32nm physical design shows 7.8 times the throughput of a highly optimized and optimistic software implementation, while consuming just 6.9% of the area and 4.3% of the power of a single Xeon core in the same technology generation.