The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
Optimizing Main-Memory Join on Modern Hardware
IEEE Transactions on Knowledge and Data Engineering
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Accelerating database operators using a network processor
DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Improving hash join performance through prefetching
ACM Transactions on Database Systems (TODS)
Why you should run TPC-DS: a workload analysis
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
Glacier: a query-to-hardware compiler
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
Design and evaluation of main memory hash join algorithms for multi-core CPUs
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Memory power management via dynamic voltage/frequency scaling
Proceedings of the 8th ACM international conference on Autonomic computing
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Efficient complex operators for irregular codes
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Dynamically Specialized Datapaths for energy efficient computing
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Toward Dark Silicon in Servers
IEEE Micro
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
Vector Extensions for Decision Support DBMS Acceleration
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Navigating big data with high-throughput, energy-efficient data partitioning
Proceedings of the 40th Annual International Symposium on Computer Architecture
LINQits: big data on little clients
Proceedings of the 40th Annual International Symposium on Computer Architecture
Main-memory hash joins on multi-core CPUs: Tuning to the underlying hardware
ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)
Hi-index | 0.00 |
The explosive growth in digital data and its growing role in real-time decision support motivate the design of high-performance database management systems (DBMSs). Meanwhile, slowdown in supply voltage scaling has stymied improvements in core performance and ushered an era of power-limited chips. These developments motivate the design of DBMS accelerators that (a) maximize utility by accelerating the dominant operations, and (b) provide flexibility in the choice of DBMS, data layout, and data types. We study data analytics workloads on contemporary in-memory databases and find hash index lookups to be the largest single contributor to the overall execution time. The critical path in hash index lookups consists of ALU-intensive key hashing followed by pointer chasing through a node list. Based on these observations, we introduce Widx, an on-chip accelerator for database hash index lookups, which achieves both high performance and flexibility by (1) decoupling key hashing from the list traversal, and (2) processing multiple keys in parallel on a set of programmable walker units. Widx reduces design cost and complexity through its tight integration with a conventional core, thus eliminating the need for a dedicated TLB and cache. An evaluation of Widx on a set of modern data analytics workloads (TPC-H, TPC-DS) using full-system simulation shows an average speedup of 3.1x over an aggressive OoO core on bulk hash table operations, while reducing the OoO core energy by 83%.