Scalable aggregation on multicore processors

  • Authors:
  • Yang Ye;Kenneth A. Ross;Norases Vesdapunt

  • Affiliations:
  • Columbia University, New York NY;Columbia University, New York NY;Columbia University, New York NY

  • Venue:
  • Proceedings of the Seventh International Workshop on Data Management on New Hardware
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In data-intensive and multi-threaded programming, the performance bottleneck has shifted from I/O bandwidth to main memory bandwidth. The availability, size, and other properties of on-chip cache strongly influence performance. A key question is whether to allow different threads to work independently, or whether to coordinate the shared workload among the threads. The independent approach avoids synchronization overhead, but requires resources proportional to the number of threads and thus is not scalable. On the other hand, the shared method suffers from coordination overhead and potential contention. In this paper, we aim to provide a solution to performing in-memory parallel aggregation on the Intel Nehalem architecture. We consider several previously proposed techniques that were evaluated on other architectures, including a hybrid independent/shared method and a method that clones data items automatically when contention is detected. We also propose two algorithms: partition-and-aggregate and PLAT. The PLAT and hybrid methods perform best overall, utilizing the computational power of multiple threads without needing memory proportional to the number of threads, and avoiding much of the coordination overhead and contention apparent in the shared table method.