Scientific computing on vector computers
Scientific computing on vector computers
An overview of Cray research computers including the Y-MP/C90 and the new MPP T3D
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Out-of-order vector architectures
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Vector architectures: past, present and future
ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Adding a vector unit to a superscalar processor
ICS '99 Proceedings of the 13th international conference on Supercomputing
SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Vector instruction set support for conditional operations
Proceedings of the 27th annual international symposium on Computer architecture
Tarantula: a vector extension to the alpha architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Implementing database operations using SIMD instructions
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Database Architecture Optimized for the New Bottleneck: Memory Access
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Vector vs. superscalar and VLIW architectures for embedded multimedia benchmarks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Vector microprocessors
Improving Hash Join Performance through Prefetching
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
The potential energy efficiency of vector acceleration
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Vectorized data processing on the cell broadband engine
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Memory Systems: Cache, DRAM, Disk
Memory Systems: Cache, DRAM, Disk
Relational query coprocessing on graphics processors
ACM Transactions on Database Systems (TODS)
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units
Proceedings of the VLDB Endowment
Vectorization vs. compilation in query execution
Proceedings of the Seventh International Workshop on Data Management on New Hardware
DRAMSim2: A Cycle Accurate Memory System Simulator
IEEE Computer Architecture Letters
Exploring the tradeoffs between programmability and efficiency in data-parallel accelerators
Proceedings of the 38th annual international symposium on Computer architecture
Meet the walkers: accelerating index traversals for in-memory databases
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Database management systems (DBMS) have become an essential tool for industry and research and are often a significant component of data centres. As a result of this criticality, efficient execution of DBMS engines has become an important area of investigation. This work takes a top-down approach to accelerating decision support systems (DSS) on x86-64 microprocessors using vector ISA extensions. In the first step, a leading DSS DBMS is analysed for potential data-level parallelism. We discuss why the existing multimedia SIMD extensions (SSE/AVX) are not suitable for capturing this parallelism and propose a complementary instruction set reminiscent of classical vector architectures. The instruction set is implemented using unintrusive modifications to a modern x86-64 micro architecture tailored for DSS DBMS. The ISA and micro architecture are evaluated using a cycle-accurate x86-64 micro architectural simulator coupled with a highly-detailed memory simulator. We have found a single operator is responsible for 41% of total execution time for the TPC-H DSS benchmark. Our results show performance speedups between 1.94x and 4.56x for an implementation of this operator run with our proposed hardware modifications.