Performance Evaluation of Task Pools Based on Hardware Synchronization

Authors:
Ralf Hoffmann;Matthias Korch;Thomas Rauber
Affiliations:
University of Bayreuth;University of Bayreuth;University of Bayreuth
Venue:
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Year:
2004

Citing 34
Cited 7

Efficient ray tracing of volume data

ACM Transactions on Graphics (TOG)
Volume rendering by adaptive refinement

The Visual Computer: International Journal of Computer Graphics
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
A rapid hierarchical radiosity algorithm

Proceedings of the 18th annual conference on Computer graphics and interactive techniques
PYRROS: static task scheduling and code generation for message passing multiprocessors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Volume rendering on scalable shared-memory MIMD architectures

VVS '92 Proceedings of the 1992 workshop on Volume visualization
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Parallel Visualization Algorithms: Performance and Architectural Implications

Computer
Scalable load balancing techniques for parallel computers

Journal of Parallel and Distributed Computing
Load balancing and data locality in adaptive hierarchical N-body methods: Barnes-Hut, fast multipole, and radiosity

Journal of Parallel and Distributed Computing
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A concurrent dynamic task graph

Parallel Computing
Programming with POSIX threads

Programming with POSIX threads
A shared-memory implementation of the hierarchical radiosity method

Theoretical Computer Science - Special issue on parallel computing
Run-time parallelization: its time has come

Parallel Computing - Special issues on languages and compilers for parallel computers
Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors

Journal of Parallel and Distributed Computing
Performance of Hierarchical Processor Scheduling in Shared-Memory Multiprocessor Systems

IEEE Transactions on Computers
Benchmarking and comparison of the task graph scheduling algorithms

Journal of Parallel and Distributed Computing
virtual data space—load balancing for irregular applications

Parallel Computing - special issue on parallel computing for irregular applications
A unified algorithm for load-balancing adaptive scientific simulations

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Scheduling Algorithms

Scheduling Algorithms
Probabilistic Rotation: Scheduling Graphs with Uncertain Execution Time

ICPP '97 Proceedings of the international Conference on Parallel Processing
The Distributed Object-Oriented Threads System DOTS

IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
A Load Balancing Tool for Distributed Parallel Loops

CLADE '03 Proceedings of the 1st International Workshop on Challenges of Large Applications in Distributed Environments
Scalable Loop Self-Scheduling Schemes for Heterogeneous Clusters

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Task Pool Teams for Implementing Irregular Algorithms on Clusters of SMPs

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Exploring Load Balancing in a Scientific SPMD Parallel Application

ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Obstruction-Free Synchronization: Double-Ended Queues as an Example

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
SLC: Symbolic Scheduling for Executing Parameterized Task Graphs on Multiprocessors

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors

ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
Cilk: efficient multithreaded computing

Cilk: efficient multithreaded computing
Shared-memory mutual exclusion: major research trends since 1986

Distributed Computing - Papers in celebration of the 20th anniversary of PODC
A comparison of task pools for dynamic load balancing of irregular algorithms: Research Articles

Concurrency and Computation: Practice & Experience

Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Fine-Grained Task Scheduling Using Adaptive Data Structures

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Parallelization libraries: Characterizing and reducing overheads

ACM Transactions on Architecture and Code Optimization (TACO)
Applicability of load balancing strategies to data-parallel embedded runge-kutta integrators

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Profiling of task-based applications on shared memory machines: scalability and bottlenecks

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hardware support for fine-grained event-driven computation in Anton 2

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A task-based execution provides a universal approach to dynamic load balancing for irregular applications. Tasks are arbitrary units of work that are created dynamically at run-time and that are stored in a parallel data structure, the task pool, until they are scheduled onto a processor for execution. In this paper, we evaluate the performance of different task pool implementations for shared-memory computer systems using several realistic applications. We consider task pools with different data structures, different load balancing strategies and a specialized memory management. In particular, we use synchronization operations based on hardware support that is available on many modern micro-processors. We show that the resulting task pool implementations lead to a much better performance than implementations using Pthreads library calls for synchronization. The applications considered are parallel quicksort, volume rendering, ray tracing, and hierarchical radiosity. The target machines are an IBM p690 server and a SunFire 6800.