Experiments with auto-parallelizing SPEC2000FP benchmarks

Authors:
Guansong Zhang;Priya Unnikrishnan;James Ren
Affiliations:
IBM Toronto Lab, Toronto, ON, Canada;IBM Toronto Lab, Toronto, ON, Canada;IBM Toronto Lab, Toronto, ON, Canada
Venue:
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Year:
2004

Citing 10
Cited 1

Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel programming with MPI

Parallel programming with MPI
Parallel programming in OpenMP

Parallel programming in OpenMP
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
The High Performance FORTRAN Handbook

The High Performance FORTRAN Handbook
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Profile-directed restructuring of operating system code

IBM Systems Journal
Structure and algorithm for implementing OpenMP workshares

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP

A practical approach to DOACROSS parallelization

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we document the experimental work in our attempts to automatically parallelize SPEC2000FP benchmarks for SMP machines. This is not purely a research project. It was implemented within IBM's software laboratory in a commercial compiler infrastructure that implements OpenMP 2.0 specifications in both Fortran and C/C++. From the beginning, our emphasis is on using simple parallelization techniques. We aim to maintain a good trade-off between performance, especially scalability of an application program and its compilation time. Although the parallelization results show relatively low speed up, it is still promising considering the problems associated with explicit parallel programming and the fact that more and more multi-thread and multi-core chips will soon be available even for home computing.