Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Improving index performance through prefetching
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Implementing database operations using SIMD instructions
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Fractal prefetching B+-Trees: optimizing both cache and disk performance
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Database Architecture Optimized for the New Bottleneck: Memory Access
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Cache Conscious Algorithms for Relational Query Processing
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Improving server software support for simultaneous multithreaded processors
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving Hash Join Performance through Prefetching
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Database Systems Concepts
Improving database performance on simultaneous multithreading processors
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Multithreaded architectures and the sort benchmark
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Generic database cost models for hierarchical memory systems
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Pipelined hash-join on multithreaded architectures
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Parallel buffers for chip multiprocessors
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Cache-oblivious databases: Limitations and opportunities
ACM Transactions on Database Systems (TODS)
Exploiting multithreaded architectures to improve the hash join operation
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Hash Join Optimization Based on Shared Cache Chip Multi-processor
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Fast and compact hash tables for integer keys
ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Design and evaluation of main memory hash join algorithms for multi-core CPUs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Hi-index | 0.00 |
As the performance gap between main memory and modern processors widens, database algorithms must be adapted to be "architecture-aware" for optimal performance. We address this issue using the computation of hash join, one of the most important operations in database query processing, to study the impact of simultaneous multithreading (SMT) and main-memory latency (cache misses) on performance.Prior work [8] has studied cache misses on a simulation based on the Compaq ES40. Our results are obtained by measuring the performance of actual hardware (Intel Pentium and Xeon, and AMD Opteron) first for the single-threaded version of the hash-join algorithm used in the prior work and a new version designed for multiple threads.We found that hardware prefetching from main-memory data into CPU cache as implemented in the architectures we tested significantly reduces the real-world benefit of software prefetching (contrary to prior work on simulated systems). We found that SMT achieved significant speedup for our thread-aware hash join algorithm when compared with a single-threaded execution on the same single processor. Software prefetching also proved beneficial in this environment.