Quickly generating billion-record synthetic databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Adaptive parallel aggregation algorithms
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Database Architecture Optimized for the New Bottleneck: Memory Access
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Improving Hash Join Performance through Prefetching
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Improving database performance on simultaneous multithreading processors
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Adaptive aggregation on chip multiprocessors
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Data partitioning on chip multiprocessors
Proceedings of the 4th international workshop on Data management on new hardware
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Automatic contention detection and amelioration for data-intensive operations
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Ameliorating memory contention of OLAP operators on GPU processors
DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Efficient frequent item counting in multi-core hardware
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Riposte: a trace-driven compiler and parallel VM for vector code in R
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
High throughput heavy hitter aggregation for modern SIMD processors
Proceedings of the Ninth International Workshop on Data Management on New Hardware
Navigating big data with high-throughput, energy-efficient data partitioning
Proceedings of the 40th Annual International Symposium on Computer Architecture
MacroDB: scaling database engines on multicores
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Adaptive and big data scale parallel execution in oracle
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In data-intensive and multi-threaded programming, the performance bottleneck has shifted from I/O bandwidth to main memory bandwidth. The availability, size, and other properties of on-chip cache strongly influence performance. A key question is whether to allow different threads to work independently, or whether to coordinate the shared workload among the threads. The independent approach avoids synchronization overhead, but requires resources proportional to the number of threads and thus is not scalable. On the other hand, the shared method suffers from coordination overhead and potential contention. In this paper, we aim to provide a solution to performing in-memory parallel aggregation on the Intel Nehalem architecture. We consider several previously proposed techniques that were evaluated on other architectures, including a hybrid independent/shared method and a method that clones data items automatically when contention is detected. We also propose two algorithms: partition-and-aggregate and PLAT. The PLAT and hybrid methods perform best overall, utilizing the computational power of multiple threads without needing memory proportional to the number of threads, and avoiding much of the coordination overhead and contention apparent in the shared table method.