Vectorized data processing on the cell broadband engine

Authors:
Sándor Héman;Niels Nes;Marcin Zukowski;Peter Boncz
Affiliations:
CWI, Kruislaan, Amsterdam, The Netherlands;CWI, Kruislaan, Amsterdam, The Netherlands;CWI, Kruislaan, Amsterdam, The Netherlands;CWI, Kruislaan, Amsterdam, The Netherlands
Venue:
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
Year:
2007

Citing 14
Cited 19

Conjunctive selection conditions in main memory

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Implementing database operations using SIMD instructions

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Volcano— An Extensible and Parallel Query Evaluation System

IEEE Transactions on Knowledge and Data Engineering
Block Oriented Processing of Relational Database Operations in Modern Computer Architectures

Proceedings of the 17th International Conference on Data Engineering
Weaving Relations for Cache Performance

Proceedings of the 27th International Conference on Very Large Data Bases
MIL primitives for querying a fragmented world

The VLDB Journal — The International Journal on Very Large Data Bases
Buffering databse operations for enhanced instruction cache performance

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Accelerating database operators using a network processor

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Compiled Query Execution Engine using JVM

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Realizing parallelism in database operations: insights from a massively multithreaded architecture

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
Buffering accesses to memory-resident index structures

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Steps towards cache-resident transaction processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Cache-conscious radix-decluster projections

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine

Proceedings of the 5th conference on Computing frontiers
Dma-based prefetching for i/o-intensive workloads on the cell architecture

Proceedings of the 5th conference on Computing frontiers
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing

Proceedings of the 4th international workshop on Data management on new hardware
Supporting MapReduce on large-scale asymmetric multi-core clusters

ACM SIGOPS Operating Systems Review
k-ary search on modern processors

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Carbon nanotube coated high-throughput neurointerfaces in assistive environments

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Relational query coprocessing on graphics processors

ACM Transactions on Database Systems (TODS)
Thread cooperation in multicore architectures for frequency counting over multiple data streams

Proceedings of the VLDB Endowment
SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Proceedings of the VLDB Endowment
Experiences with parallelizing a bio-informatics program on the cell BE

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Automatic contention detection and amelioration for data-intensive operations

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Designing Accelerator-Based Distributed Systems for High Performance

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A capabilities-aware framework for using computational accelerators in data-intensive computing

Journal of Parallel and Distributed Computing
The database architectures research group at CWI

ACM SIGMOD Record
Interactive data mining on a CBEA cluster

HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
SharedDB: killing one thousand queries with one stone

Proceedings of the VLDB Endowment
Vector Extensions for Decision Support DBMS Acceleration

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Designing a database system for modern processing architectures

Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Hardware-oblivious parallelism for in-memory column-stores

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we research the suitability of the Cell Broadband Engine for database processing. We start by outlining the main architectural features of Cell and use micro-benchmarks to characterize the latency and throughput of its memory infrastructure. Then, we discuss the challenges of porting RDBMS software to Cell: (i) all computations need to SIMD-ized, (ii) all performance-critical branches need to be eliminated, (iii) a very small and hard limit on program code size should be respected. While we argue that conventional database implementations, i.e. row-stores with Volcano-style tuple pipelining, are a hard fit to Cell, it turns out that the three challenges are quite easily met in databases that use column-wise processing. We managed to implement a proof-of-concept port of the vectorized query processing model of MonetDB/X100 on Cell by running the operator pipeline on the PowerPC, but having it execute the vectorized primitives (data parallel) on its SPE cores. A performance evaluation on TPC-H Q1 shows that vectorized query processing on Cell can beat conventional PowerPC and Itanium2 CPUs by a factor 20.