Scaling Parallel Programs for Multiprocessors: Methodology and Examples

Authors:
Jaswinder Pal Singh;John L. Hennessy;Anoop Gupta
Affiliations:
-;-;-
Venue:
Computer
Year:
1993

Citing 7
Cited 36

A fast algorithm for particle simulations

Journal of Computational Physics
Reevaluating Amdahl's law

Communications of the ACM
The effect of time constraints on scaled speedup

SIAM Journal on Scientific and Statistical Computing
Scalability of parallel machines

Communications of the ACM
Finding and exploiting parallelism in an ocean simulation program: experience, results, and implications

Journal of Parallel and Distributed Computing
Load balancing and data locality in adaptive hierarchical N-body methods: Barnes-Hut, fast multipole, and radiosity

Journal of Parallel and Distributed Computing
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture

Working sets, cache sizes, and node granularity issues for large-scale multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An empirical comparison of the Kendall Square Research KSR-1 and Stanford DASH multiprocessors

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Cost/performance of a parallel computer simulator

PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
How to Measure, Present, and Compare Parallel Performance

IEEE Parallel & Distributed Technology: Systems & Technology
The performance advantages of integrating block data transfer in cache-coherent multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Implications of hierarchical N-body methods for multiprocessor architectures

ACM Transactions on Computer Systems (TOCS)
Future applicability of bus-based shared memory multiprocessors

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Future applicability of bus-based shared memory multiprocessors

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Towards modeling the performance of a fast connected components algorithm on parallel machines

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Application and architectural bottlenecks in large scale distributed shared memory machines

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Thread scheduling for cache locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Modeling cost/performance of a parallel computer simulator

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A methodology and an evaluation of the SGI Origin2000

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Portable and Efficient Parallel Computing Using the BSP Model

IEEE Transactions on Computers
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
Scalable Parallel Genetic Algorithms

Artificial Intelligence Review
Relationships Between Efficiency and Execution Time of Full Multigrid Methods on Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Next Generation System Software for Future High-End Computing Systems

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance of Scheduling Scientific Applications with Adaptive Weighted Factoring

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The Forgotten Factor: Facts on Performance Evaluation and Its Dependence on Workloads

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Methodology for User-Oriented Scalability Analysis

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Banyan: A Language for Scalable Parallel Programming on Loosely Coupled Distributed Systems

HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Integrated control of distributed volume visaulization through the World-Wide-Web

VIS '94 Proceedings of the conference on Visualization '94
Dyn-MPI: Supporting MPI on Non Dedicated Clusters

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Dyn-MPI: Supporting MPI on medium-scale, non-dedicated clusters

Journal of Parallel and Distributed Computing
Distributed filaments: efficient fine-grain parallelism on a cluster of workstations

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
A regression-based approach to scalability prediction

Proceedings of the 22nd annual international conference on Supercomputing
Scheduling Parallel Tasks with Communication Overhead in an Environment with Multiple Machines

IEICE - Transactions on Information and Systems
How to simulate 1000 cores

ACM SIGARCH Computer Architecture News
Scalability analysis of parallel systems with multiple components of work

EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
Capacity metric for chip heterogeneous multiprocessors

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A simplified contact-friction methodology for modeling wire breaks in parallel wire strands

Computers and Structures
A study of average-case speedup and scalability of parallel computations on static networks

Mathematical and Computer Modelling: An International Journal

Quantified Score

Hi-index	4.10

Visualization

Abstract

Models for the constraints under which an application should be scaled, including constant problem-size scaling, memory-constrained scaling, and time-constrained scaling, are reviewed. A realistic method is described that scales all relevant parameters under considerations imposed by the application domain. This method leads to different conclusions about the effectiveness and design of large multiprocessors than the naive practice of scaling only the data set size. The primary example application is a simulation of galaxies using the Barnes-Hut hierarchical N-body method.