Executing stream joins on the cell processor

Authors:
Buǧra Gedik;Philip S. Yu;Rajesh R. Bordawekar
Affiliations:
Thomas J. Watson Research Center, IBM Research, Hawthorne, NY;Thomas J. Watson Research Center, IBM Research, Hawthorne, NY;Thomas J. Watson Research Center, IBM Research, Hawthorne, NY
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 21
Cited 13

Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs

Communications of the ACM
Limiting Factors of Join Performance on Parallel Processors

Proceedings of the Fifth International Conference on Data Engineering
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
An Evaluation of Non-Equijoin Algorithms

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Approximate join processing over data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Static optimization of conjunctive queries with sliding windows over infinite streams

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Retrospective on Aurora

The VLDB Journal — The International Journal on Very Large Data Bases
Fast and approximate stream mining of quantiles and frequencies using graphics processors

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimizing Compiler for the CELL Processor

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Adaptive load shedding for windowed stream joins

Proceedings of the 14th ACM international conference on Information and knowledge management
Accelerating database operators using a network processor

DaMoN '05 Proceedings of the 1st international workshop on Data management on new hardware
Stream window join: tracking moving objects in sensor-network databases

SSDBM '03 Proceedings of the 15th International Conference on Scientific and Statistical Database Management
GPUTeraSort: high performance graphics co-processor sorting for large database management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
ViCo: an adaptive distributed video correlation system

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Maximizing the output rate of multi-way join queries over streaming information sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Load shedding in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Memory-limited execution of windowed stream joins

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Data parallel acceleration of decision support queries using Cell/BE and GPUs

Proceedings of the 6th ACM conference on Computing frontiers
FPGA: what's in it for a database?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Relational query coprocessing on graphics processors

ACM Transactions on Database Systems (TODS)
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
Thread cooperation in multicore architectures for frequency counting over multiple data streams

Proceedings of the VLDB Endowment
Streams on wires: a query compiler for FPGAs

Proceedings of the VLDB Endowment
Improving the performance of list intersection

Proceedings of the VLDB Endowment
FPGAs: a new point in the database design space

Proceedings of the 13th International Conference on Extending Database Technology
Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems

Parallel Computing
Multicore acceleration of Discrete Event System Specification systems

Simulation
Photon: fault-tolerant and scalable joining of continuous data streams

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
The HELLS-join: a heterogeneous stream join for extremely large windows

Proceedings of the Ninth International Workshop on Data Management on New Hardware

Quantified Score

Hi-index	0.00

Visualization

Abstract

Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capacity are ideal matches for executing costly DSMS operators. The recently developed Cell processor is a good example of a heterogeneous multi-core architecture and provides a powerful platform for executing data stream operators with high-performance. On the down side, exploiting the full potential of a multi-core processor like Cell is often challenging, mainly due to the heterogeneous nature of the processing elements, the software managed local memory at the co-processor side, and the unconventional programming model in general. In this paper, we study the problem of scalable execution of windowed stream join operators on multi-core processors, and specifically on the Cell processor. By examining various aspects of join execution flow, we determine the right set of techniques to apply in order to minimize the sequential segments and maximize parallelism. Concretely, we show that basic windows coupled with low-overhead pointer-shifting techniques can be used to achieve efficient join window partitioning, column-oriented join window organization can be used to minimize scattered data transfers, delay-optimized double buffering can be used for effective pipelining, rate-aware batching can be used to balance join throughput and tuple delay, and finally SIMD (single-instruction multiple-data) optimized operator code can be used to exploit data parallelism. Our experimental results show that, following the design guidelines and implementation techniques outlined in this paper, windowed stream joins can achieve high scalability (linear in the number of co-processors) by making efficient use of the extensive hardware parallelism provided by the Cell processor (reaching data processing rates of ≈ 13 GB/sec) and significantly surpass the performance obtained form conventional high-end processors (supporting a combined input stream rate of 2000 tuples/sec using 15 minutes windows and without dropping any tuples, resulting in ≈ 8.3 times higher output rate compared to an SSE implementation on dual 3.2Ghz Intel Xeon).