Evaluation of streaming aggregation on parallel hardware architectures

Authors:
Scott Schneidert;Henrique Andrade;Buǧra Gedik;Kun-Lung Wu;Dimitrios S. Nikolopoulos
Affiliations:
Virginia Tech, Blacksburg, VA and T.J. Watson Research Center, IBM Research, Hawthorne, NY;T.J. Watson Research Center, IBM Research, Hawthorne, NY;T.J. Watson Research Center, IBM Research, Hawthorne, NY;T.J. Watson Research Center, IBM Research, Hawthorne, NY;Foundation for Research and Technology--Hellas, Heraklion, Greece
Venue:
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Year:
2010

Citing 16
Cited 7

Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors

ICS '99 Proceedings of the 13th international conference on Supercomputing
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Adaptive Control of Extreme-scale Stream Processing Systems

ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Dynamic multigrain parallelization on the cell broadband engine

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Towards Autonomic Fault Recovery in System-S

ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
Challenges and experience in prototyping a multi-modal stream analytic and monitoring application on System S

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cell broadband engine architecture and its first implementation: a performance view

IBM Journal of Research and Development
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
SODA: An Optimizing Scheduler for Large-Scale Stream-Based Distributed Computer Systems

Middleware '08 Proceedings of the ACM/IFIP/USENIX 9th International Middleware Conference
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluating multi-core platforms for HPC data-intensive kernels

Proceedings of the 6th ACM conference on Computing frontiers
Scale-Up Strategies for Processing High-Rate Data Streams in System S

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Implementing a high-volume, low-latency market data processing system on commodity hardware using IBM middleware

Proceedings of the 2nd Workshop on High Performance Computational Finance
Multi-core acceleration of chemical kinetics for simulation and prediction

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Machine learning-based prefetch optimization for data center applications

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

Processing data streams with hard real-time constraints on heterogeneous systems

Proceedings of the international conference on Supercomputing
Ultra low latency market data feed on IBM PowerENTM

Computer Science - Research and Development
Rapid detection of rare geospatial events: earthquake warning applications

Proceedings of the 5th ACM international conference on Distributed event-based system
High performance content-based matching using GPUs

Proceedings of the 5th ACM international conference on Distributed event-based system
Low latency complex event processing on parallel hardware

Journal of Parallel and Distributed Computing
An embedded co-processor for accelerating window joins over uncertain data streams

Microprocessors & Microsystems
Optimizing IBM algorithmics' mark-to-future aggregation engine for real-time counterparty credit risk scoring

WHPCF '13 Proceedings of the 6th Workshop on High Performance Computational Finance

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a case study parallelizing streaming aggregation on three different parallel hardware architectures. Aggregation is a performance-critical operation for data summarization in stream computing, and is commonly found in sense-and-respond applications. Currently available commodity parallel hardware provides promise as accelerators for streaming aggregation. However, how streaming aggregation can map to the different parallel architectures is still an open question. Streaming aggregation is obviously data parallel, but in practice its performance relies more on efficient data movement than computation, as we will demonstrate. Furthermore, we used workloads such as stock market data, which introduces unique data distribution problems. The three parallel architectures we use in our study are an Intel Core 2 Quad processor, an Nvidia GTX 285 GPU and the IBM PowerXCell 8i, an enhanced version of the Cell Broadband Engine architecture. Our implementations use OpenMP, CUDA and Cellgen (a compiler for OpenMP-like support on Cell) respectively. We find that the Cell's programmable local storage, and its low latency, high bandwidth access to main memory are best suited for parallelizing streaming aggregation. GPUs in the future can overcome the latency and bandwidth limitations by being fully integrated in the system's memory hierarchy. In order to attain good performance on existing parallel architectures, we find that developers must characterize their problem in terms of communication versus computation costs; memory access patterns, including assessing whether their algorithms reuse data; and the granularity of data access patterns.