Implementing database operations using SIMD instructions

Authors:
Jingren Zhou;Kenneth A. Ross
Affiliations:
Columbia University;Columbia University
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 19
Cited 40

Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A reliable randomized algorithm for the closest-pair problem

Journal of Algorithms
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Search within a Page

Journal of the ACM (JACM)
Making B+- trees cache conscious in main memory

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Optimizing multidimensional index trees for main memory access

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Improving index performance through prefetching

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Conjunctive selection conditions in main memory

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
B-Tree Indexes and CPU Caches

Proceedings of the 17th International Conference on Data Engineering
Cache Conscious Indexing for Decision-Support in Main Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
DBMSs on a Modern Processor: Where Does Time Go?

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
What Happens During a Join? Dissecting CPU and Memory Optimization Effects

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Evaluation of Non-Equijoin Algorithms

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
Multimedia Instruction Sets for General Purpose Microprocessors: a

Multimedia Instruction Sets for General Purpose Microprocessors: a

Selection conditions in main memory

ACM Transactions on Database Systems (TODS)
Fast computation of database operations using graphics processors

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fast and approximate stream mining of quantiles and frequencies using graphics processors

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Efficient relational database management using graphics processors

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
A portal for access to complex distributed information about energy

dg.o '02 Proceedings of the 2002 annual national conference on Digital government research
Database hash-join algorithms on multithreaded computer architectures

Proceedings of the 3rd conference on Computing frontiers
Realizing parallelism in database operations: insights from a massively multithreaded architecture

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
GPUTeraSort: high performance graphics co-processor sorting for large database management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Fast computation of database operations using graphics processors

SIGGRAPH '05 ACM SIGGRAPH 2005 Courses
CellSort: high performance sorting on the cell processor

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Vectorized data processing on the cell broadband engine

DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Breaking the memory wall in MonetDB

Communications of the ACM - Surviving the data deluge
Row-wise parallel predicate evaluation

Proceedings of the VLDB Endowment
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing

Proceedings of the 4th international workshop on Data management on new hardware
FPGA: what's in it for a database?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Frequent itemset mining on graphics processors

Proceedings of the Fifth International Workshop on Data Management on New Hardware
k-ary search on modern processors

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Engineering burstsort: Toward fast in-place string sorting

Journal of Experimental Algorithmics (JEA)
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Proceedings of the VLDB Endowment
Data processing on FPGAs

Proceedings of the VLDB Endowment
Implementation of bitmap based incognito and performance evaluation

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
FAST: fast architecture sensitive tree search on modern CPUs and GPUs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
BAR: bitmap-based association rule: an implementation and its optimizations

Proceedings of the 7th International Conference on Advances in Mobile Computing and Multimedia
Speeding up queries in column stores: a case for compression

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
VSkyline: vectorization for efficient skyline computation

ACM SIGMOD Record
A Novel Multicontext Coarse-Grained Reconfigurable Architecture (CGRA) For Accelerating Column-Oriented Databases

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Designing fast architecture-sensitive tree search on modern multicore/many-core processors

ACM Transactions on Database Systems (TODS)
Sorting networks on FPGAs

The VLDB Journal — The International Journal on Very Large Data Bases
Extending a C-like language for portable SIMD programming

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A high-performance sorting algorithm for multicore single-instruction multiple-data processors

Software—Practice & Experience
Database analytics acceleration using FPGAs

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Exploiting SIMD instructions in current processors to improve classical string algorithms

ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Vector Extensions for Decision Support DBMS Acceleration

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
BitWeaving: fast scans for main memory data processing

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
High throughput heavy hitter aggregation for modern SIMD processors

Proceedings of the Ninth International Workshop on Data Management on New Hardware
Navigating big data with high-throughput, energy-efficient data partitioning

Proceedings of the 40th Annual International Symposium on Computer Architecture
Parallel multi-dimensional range query processing with R-trees on GPU

Journal of Parallel and Distributed Computing
Instant loading for main memory databases

Proceedings of the VLDB Endowment
Sierra: a SIMD extension for C++

Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern CPUs have instructions that allow basic operations to be performed on several data elements in parallel. These instructions are called SIMD instructions, since they apply a single instruction to multiple data elements. SIMD technology was initially built into commodity processors in order to accelerate the performance of multimedia applications. SIMD instructions provide new opportunities for database engine design and implementation. We study various kinds of operations in a database context, and show how the inner loop of the operations can be accelerated using SIMD instructions. The use of SIMD instructions has two immediate performance benefits: It allows a degree of parallelism, so that many operands can be processed at once. It also often leads to the elimination of conditional branch instructions, reducing branch mispredictions.We consider the most important database operations, including sequential scans, aggregation, index operations, and joins. We present techniques for implementing these using SIMD instructions. We show that there are significant benefits in redesigning traditional query processing algorithms so that they can make better use of SIMD technology. Our study shows that using a SIMD parallelism of four, the CPU time for the new algorithms is from 10% to more than four times less than for the traditional algorithms. Superlinear speedups are obtained as a result of the elimination of branch misprediction effects.