Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Fundamentals of queueing theory (2nd ed.).
Fundamentals of queueing theory (2nd ed.).
Analytic Queueing Network Models for Parallel Processing of Task Systems
IEEE Transactions on Computers
IEEE Transactions on Computers
Communications of the ACM
Measuring Parallelism in Computation-Intensive Scientific/Engineering Applications
IEEE Transactions on Computers
Performance Prediction and Calibration for a Class of Multiprocessors
IEEE Transactions on Computers
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Speedup Versus Efficiency in Parallel Systems
IEEE Transactions on Computers
A modeling methodology for the analysis of concurrent systems and computations
Journal of Parallel and Distributed Computing
Determining average program execution times and their variance
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
An analytic model of multistage interconnection networks
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Stochastic Bounds on Execution Times of Parallel Programs
IEEE Transactions on Software Engineering
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Polynomial root-finding: analysis and computational investigation of a parallel algorithm
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A static parameter based performance prediction tool for parallel programs
ICS '93 Proceedings of the 7th international conference on Supercomputing
The influence of random delays on parallel execution times
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analyzing multiprocessor cache behavior through data reference modeling
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Precise compile-time performance prediction for superscalar-based computers
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
FAST: a functional algorithm simulated testbed
FAST: a functional algorithm simulated testbed
Analyzing the behavior and performance of parallel programs
Analyzing the behavior and performance of parallel programs
High-level optimization via automated statistical modeling
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Predicting application behavior in large scale shared-memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Asynchronous Analysis of Parallel Dynamic Programming Algorithms
IEEE Transactions on Parallel and Distributed Systems
Semi-empirical multiprocessor performance predictions
Journal of Parallel and Distributed Computing
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Predictive analysis of a wavefront application using LogGP
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
On Performance Prediction of Parallel Computations with Precedent Constraints
IEEE Transactions on Parallel and Distributed Systems
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Fundamentals of Computer Alori
Fundamentals of Computer Alori
Interpreting the performance of HPF/Fortran 90D
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Visual Programming and Debugging for Parallel Computing
IEEE Parallel & Distributed Technology: Systems & Technology
Predicting Performance of Parallel Computations
IEEE Transactions on Parallel and Distributed Systems
Analysis of Fork-Join Program Response Times on Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Performance of Synchronous Parallel Algorithms with Regular Structures
IEEE Transactions on Parallel and Distributed Systems
A Multiprocessor Bus Design Model Validated by System Measurement
IEEE Transactions on Parallel and Distributed Systems
Loop-Level Parallelism in Numeric and Symbolic Programs
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing
IEEE Transactions on Parallel and Distributed Systems
POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems
IEEE Transactions on Software Engineering
Symbolic Performance Modeling of Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
FAST: A Functional Algoritm Simulation Testbed
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
A probabilistic approach to parallel system performance modelling
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Integrated Compilation and Scalability Analysis for Parallel Systems
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Performance Prediction and Tuning of Parallel Programs
Performance Prediction and Tuning of Parallel Programs
Performance of parallel programs: model and analyses
Performance of parallel programs: model and analyses
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
A performance model of non-deterministic particle transport on large-scale systems
Future Generation Computer Systems
Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Efficient scheduling algorithm for component-based networks
Future Generation Computer Systems
Predictive performance modelling of parallel component compositions
Cluster Computing
Performance modeling of parallel applications for grid scheduling
Journal of Parallel and Distributed Computing
A simulator for adaptive parallel applications
Journal of Computer and System Sciences
Exhaustion dominated performance: a first attempt
Proceedings of the 2009 ACM symposium on Applied Computing
A performance model of non-deterministic particle transport on large-scale systems
Future Generation Computer Systems
New techniques for simulating high performance MPI applications on large storage networks
The Journal of Supercomputing
Workload characterization for operator-based distributed stream processing applications
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Proceedings of the Conference on Design, Automation and Test in Europe
A simulator for parallel applications with dynamically varying compute node allocation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Multifaceted web services: an approach to secure and scalable grid scheduling
EuroWeb'02 Proceedings of the 2002 international conference on EuroWeb
Disciplined concurrent programming using tasks with effects
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Distributed control for the networks of adaptive software components
Information Systems Frontiers
Hi-index | 0.00 |
In this article, we consider analytical techniques for predicting detailed performance characteristics of a single shared memory parallel program for a particular input. Analytical models for parallel programs have been successful at providing simple qualitative insights and bounds on program scalability, but have been less successful in practice for providing detailed insights and metrics for program performance (leaving these to measurement or simulation). We develop a conceptually simple modeling technique called deterministic task graph analysis that provides detailed performance prediction for shared-memory programs with arbitrary task graphs, a wide variety of task scheduling policies, and significant communication and resource contention. Unlike many previous models that are stochastic models, our model assumes deterministic task execution times (while retaining the use of stochastic models for communication and resource contention). This assumption is supported by a previous study of the influence of nondeterministic delays in parallel programs.We evaluate our model in three ways. First, an experimental evaluation shows that our analysis technique is accurate and efficient for a variety of shared-memory programs, including programs with large and/or complex task graphs, sophisticated task scheduling, highly nonuniform task times, and significant communication and resource contention. The results also show that the deterministic assumption is crucial to permit accurate and yet efficient analysis of these programs. Second, we use three example programs to illustrate the predictive capabilities of the model. In two cases, broad insights and detailed metrics from the model are used to suggest improvements in load-balancing and the model quickly and accurately predicts the impact of these changes. In the third case, the model provides novel insights into the impact of program design changes that improve communication locality as well as load-balancing, via new (but general-purpose) metrics. Finally, we present results from a comparison of our model and representative stochastic models, and use these to characterize the conditions under which a deterministic model or stochastic models would be appropriate.