Efficient ray tracing of volume data
ACM Transactions on Graphics (TOG)
Volume rendering by adaptive refinement
The Visual Computer: International Journal of Computer Graphics
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
A rapid hierarchical radiosity algorithm
Proceedings of the 18th annual conference on Computer graphics and interactive techniques
PYRROS: static task scheduling and code generation for message passing multiprocessors
ICS '92 Proceedings of the 6th international conference on Supercomputing
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
The SPARC architecture manual (version 9)
The SPARC architecture manual (version 9)
Scalable load balancing techniques for parallel computers
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A concurrent dynamic task graph
Parallel Computing
Programming with POSIX threads
Programming with POSIX threads
A shared-memory implementation of the hierarchical radiosity method
Theoretical Computer Science - Special issue on parallel computing
Run-time parallelization: its time has come
Parallel Computing - Special issues on languages and compilers for parallel computers
Nonblocking algorithms and preemption-safe locking on multiprogrammed shared memory multiprocessors
Journal of Parallel and Distributed Computing
Performance of Hierarchical Processor Scheduling in Shared-Memory Multiprocessor Systems
IEEE Transactions on Computers
Benchmarking and comparison of the task graph scheduling algorithms
Journal of Parallel and Distributed Computing
virtual data space—load balancing for irregular applications
Parallel Computing - special issue on parallel computing for irregular applications
A unified algorithm for load-balancing adaptive scientific simulations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Scheduling Algorithms
Probabilistic Rotation: Scheduling Graphs with Uncertain Execution Time
ICPP '97 Proceedings of the international Conference on Parallel Processing
The Distributed Object-Oriented Threads System DOTS
IRREGULAR '98 Proceedings of the 5th International Symposium on Solving Irregularly Structured Problems in Parallel
A Load Balancing Tool for Distributed Parallel Loops
CLADE '03 Proceedings of the 1st International Workshop on Challenges of Large Applications in Distributed Environments
Scalable Loop Self-Scheduling Schemes for Heterogeneous Clusters
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Task Pool Teams for Implementing Irregular Algorithms on Clusters of SMPs
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Exploring Load Balancing in a Scientific SPMD Parallel Application
ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Obstruction-Free Synchronization: Double-Ended Queues as an Example
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
SLC: Symbolic Scheduling for Executing Parameterized Task Graphs on Multiprocessors
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
A Scalable and Efficient Storage Allocator on Shared Memory Multiprocessors
ISPAN '99 Proceedings of the 1999 International Symposium on Parallel Architectures, Algorithms and Networks
Cilk: efficient multithreaded computing
Cilk: efficient multithreaded computing
Shared-memory mutual exclusion: major research trends since 1986
Distributed Computing - Papers in celebration of the 20th anniversary of PODC
A comparison of task pools for dynamic load balancing of irregular algorithms: Research Articles
Concurrency and Computation: Practice & Experience
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Fine-Grained Task Scheduling Using Adaptive Data Structures
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Parallelization libraries: Characterizing and reducing overheads
ACM Transactions on Architecture and Code Optimization (TACO)
Applicability of load balancing strategies to data-parallel embedded runge-kutta integrators
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Profiling of task-based applications on shared memory machines: scalability and bottlenecks
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hardware support for fine-grained event-driven computation in Anton 2
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
A task-based execution provides a universal approach to dynamic load balancing for irregular applications. Tasks are arbitrary units of work that are created dynamically at run-time and that are stored in a parallel data structure, the task pool, until they are scheduled onto a processor for execution. In this paper, we evaluate the performance of different task pool implementations for shared-memory computer systems using several realistic applications. We consider task pools with different data structures, different load balancing strategies and a specialized memory management. In particular, we use synchronization operations based on hardware support that is available on many modern micro-processors. We show that the resulting task pool implementations lead to a much better performance than implementations using Pthreads library calls for synchronization. The applications considered are parallel quicksort, volume rendering, ray tracing, and hierarchical radiosity. The target machines are an IBM p690 server and a SunFire 6800.