On the Automatic Parallelization of the Perfect Benchmarks®

Authors:
Rudolf Eigenmann;Jay Hoeflinger;David Padua
Affiliations:
Purdue Univ., West Lafayette, IN;Univ. of Illinois, Urbana;Univ. of Illinois, Urbana
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1998

Citing 23
Cited 39

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Efficient and exact data dependence analysis

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A practical algorithm for exact array dependence analysis

Communications of the ACM
Beyond induction variables

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Array privatization for parallel execution of loops

ICS '92 Proceedings of the 6th international conference on Supercomputing
The cedar system and an initial performance study

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Toward a methodology of optimizing programs for high-performance computers

ICS '93 Proceedings of the 7th international conference on Supercomputing
An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Run-time methods for parallelizing partially parallel loops

ICS '95 Proceedings of the 9th international conference on Supercomputing
Idiom recognition in the Polaris parallelizing compiler

ICS '95 Proceedings of the 9th international conference on Supercomputing
The range test: a dependence test for symbolic, non-linear expressions

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Automatic Detection of Parallelism: A Grand Challenge for High-Performance Computing

IEEE Parallel & Distributed Technology: Systems & Technology
Benchmarking with Real Industrial Applications: The SPEC High-Performance Group

IEEE Computational Science & Engineering
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs

IEEE Transactions on Parallel and Distributed Systems
Symbolic range propagation

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Automatic Parallelization for Non-cache Coherent Multiprocessors

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Are Parallel Workstations the Right Target for Parallelizing Compilers?

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Experience in the Automatic Parallelization of Four Perfect-Benchmark Programs

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Data Dependence and Data-Flow Analysis of Arrays

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Parallelizing while loops for multiprocessor systems

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing

A Compiler Optimization Algorithm for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Constraint-based array dependence analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Nonlinear and Symbolic Data Dependence Testing

IEEE Transactions on Parallel and Distributed Systems
Evaluating Automatic Parallelization in SUIF

IEEE Transactions on Parallel and Distributed Systems
Efficient Interprocedural Array Data-Flow Analysis for Automatic Program Parallelization

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
A framework for remote dynamic program optimization

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences

IEEE Transactions on Parallel and Distributed Systems
Monotonic evolution: an alternative to induction variable substitution for dependence analysis

ICS '01 Proceedings of the 15th international conference on Supercomputing
Unified Interprocedural Parallelism Detection

International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing

International Journal of Parallel Programming
ParAgent: A Domain-Specific Semi-automatic Parallelization Tool

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
The Access Region Test

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Coarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance

WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References

IEEE Transactions on Parallel and Distributed Systems
A Polynomial-Time Dependence Test for Determining Integer-Valued Solutions in Multi-Dimensional Arrays Under Variable Bounds

The Journal of Supercomputing
The Development of Parkbench and Performance Prediction

International Journal of High Performance Computing Applications
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Systems research challenges: a scale-out perspective

IBM Journal of Research and Development
Combining compile-time and run-time parallelization[1]

Scientific Programming
Parallel programming environment for OpenMP

Scientific Programming
Software-cooperative power-efficient heterogeneous multi-core for media processing

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
One-dimensional I test and direction vector I test with array references by induction variable

International Journal of High Performance Computing and Networking
A multi-dimensional Interval Reduction test

International Journal of High Performance Computing and Networking
Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions

International Journal of Computational Science and Engineering
Language Extensions in Support of Compiler Parallelization

Languages and Compilers for Parallel Computing
Can transactions enhance parallel programs?

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Coarse grain task parallel processing with cache optimization on shared memory multiprocessor

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Performance evaluation of compiler controlled power saving scheme

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Performance analysis and tuning of automatically parallelized OpenMP applications

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Compiler control power saving scheme for multi core processors

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A user-guided semi-automatic parallelization method and its implementation

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Performance of OSCAR multigrain parallelizing compiler on SMP servers

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Hierarchical parallelism control for multigrain parallel processing

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Portable section-level tuning of compiler parallelized applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation

International Journal of Parallel Programming

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents the results of the Cedar Hand-Parallelization Experiment, conducted from 1989 through 1992, within the Center for Supercomputing Research and Development (CSRD) at the University of Illinois. In this experiment, we manually transformed the Perfect Benchmarks炉 into parallel program versions. In doing so, we used techniques that may be automated in an optimizing compiler. We then ran these programs on the Cedar multiprocessor (built at CSRD during the 1980s) and measured the speed improvement due to each technique.The results presented here extend the findings previously reported in [11]. The techniques credited most for the performance gains include array privatization, parallelization of reduction operations, and the substitution of generalized induction variables. All these techniques can be considered extensions of transformations that were available in vectorizers and commercial restructuring compilers of the late 1980s. We applied these transformations by hand to the given programs, in a mechanical manner, similar to that of a parallelizing compiler. Because of our success with these transformations, we believed that it would be possible to implement many of these techniques in a new parallelizing compiler. Such a compiler has been completed in the meantime and we show preliminary results.