UTS: an unbalanced tree search benchmark

Authors:
Stephen Olivier;Jun Huan;Jinze Liu;Jan Prins;James Dinan;P. Sadayappan;Chau-Wen Tseng
Affiliations:
Dept. of Computer Science, Univ. of North Carolina at Chapel Hill;Dept. of Computer Science, Univ. of North Carolina at Chapel Hill;Dept. of Computer Science, Univ. of North Carolina at Chapel Hill;Dept. of Computer Science, Univ. of North Carolina at Chapel Hill;Dept. of Computer Science and Engineering, The Ohio State Univ.;Dept. of Computer Science and Engineering, The Ohio State Univ.;Dept. of Computer Science, Univ. of Maryland at College Park
Venue:
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Year:
2006

Citing 8
Cited 16

Parallel depth first search. Part II. analysis

International Journal of Parallel Programming
Scalable load balancing techniques for parallel computers

Journal of Parallel and Distributed Computing
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Analytic Comparison of Two Advanced C Language-Based Parallel Programming Models

ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
Graphs over time: densification laws, shrinking diameters and possible explanations

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Hardware profile-guided automatic page placement for ccNUMA systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science

Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Analyzing lock contention in multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Unbalanced tree search on a manycore system using the GPI programming model

Computer Science - Research and Development
Work stealing for multi-core HPC clusters

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Shared work list: hacking amorphous data parallelism in UPC

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Using load information in work-stealing on distributed systems with non-uniform communication latencies

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Work-stealing with configurable scheduling strategies

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
How to be a successful thief: feudal work stealing for irregular divide-and-conquer applications on heterogeneous distributed systems

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
A synthetic task model for HPC-grade optical network performance evaluation

IA^3 '13 Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms
Load balancing non-uniform parallel computations

Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized control
Heterogeneous-race-free memory models

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
X10 and APGAS at Petascale

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Friendly barriers: efficient work-stealing with return barriers

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an unbalanced tree search (UTS) benchmark designed to evaluate the performance and ease of programming for parallel applications requiring dynamic load balancing. We describe algorithms for building a variety of unbalanced search trees to simulate different forms of load imbalance. We created versions of UTS in two parallel languages, OpenMP and Unified Parallel C (UPC), using work stealing as the mechanism for reducing load imbalance. We benchmarked the performance of UTS on various parallel architectures, including shared-memory systems and PC clusters. We found it simple to implement UTS in both UPC and OpenMP, due to UPC's shared-memory abstractions. Results show that both UPC and OpenMP can support efficient dynamic load balancing on shared-memory architectures. However, UPC cannot alleviate the underlying communication costs of distributed-memory systems. Since dynamic load balancing requires intensive communication, performance portability remains difficult for applications such as UTS and performance degrades on PC clusters. By varying key work stealing parameters, we expose important tradeoffs between the granularity of load balance, the degree of parallelism, and communication costs.