Parallel Programming with Polaris

Authors:
William Blume;Ramon Doallo;Rudolf Eigenmann;John Grout;Jay Hoeflinger;Thomas Lawrence;Jaejin Lee;David Padua;Yunheung Paek;Bill Pottenger;Lawrence Rauchwerger;Peng Tu
Affiliations:
-;-;-;-;-;-;-;-;-;-;-;-
Venue:
Computer
Year:
1996

Citing 4
Cited 140

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Idiom recognition in the Polaris parallelizing compiler

ICS '95 Proceedings of the 9th international conference on Supercomputing
The range test: a dependence test for symbolic, non-linear expressions

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Automatic Array Privatization

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing

Simplification of array access patterns for compiler optimizations

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Comparing data forwarding and prefetching for communication-induced misses in shared-memory MPs

ICS '98 Proceedings of the 12th international conference on Supercomputing
The role of associativity and commutativity in the detection and transformation of loop-level parallelism

ICS '98 Proceedings of the 12th international conference on Supercomputing
Measuring the effectiveness of automatic parallelization in SUIF

ICS '98 Proceedings of the 12th international conference on Supercomputing
Nonlinear and Symbolic Data Dependence Testing

IEEE Transactions on Parallel and Distributed Systems
SUIF Explorer: an interactive and interprocedural parallelizer

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluation of predicated array data-flow analysis for automatic parallelization

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Techniques for the translation of MATLAB programs into Fortran 90

ACM Transactions on Programming Languages and Systems (TOPLAS)
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Evaluating Automatic Parallelization in SUIF

IEEE Transactions on Parallel and Distributed Systems
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Adaptive reduction parallelization techniques

Proceedings of the 14th international conference on Supercomputing
A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Proceedings of the 14th international conference on Supercomputing
Efficient Interprocedural Array Data-Flow Analysis for Automatic Program Parallelization

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Maximal Static Expansion

International Journal of Parallel Programming
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
Compiler analysis of irregular memory accesses

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Towards an integrated, web-executable parallel programming tool environment

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A synthesis of memory mechanisms for distributed architectures

ICS '01 Proceedings of the 15th international conference on Supercomputing
Monotonic evolution: an alternative to induction variable substitution for dependence analysis

ICS '01 Proceedings of the 15th international conference on Supercomputing
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Removing architectural bottlenecks to the scalability of speculative parallelization

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Reference idempotency analysis: a framework for optimizing speculative execution

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
High-level adaptive program optimization with ADAPT

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Automatic Code Mapping on an Intelligent Memory Architecture

IEEE Transactions on Computers
Efficient and precise array access analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
Hybrid analysis: static & dynamic memory reference analysis

ICS '02 Proceedings of the 16th international conference on Supercomputing
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Unified Interprocedural Parallelism Detection

International Journal of Parallel Programming
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

International Journal of Parallel Programming
Programming Languages for CSE: The State of the Art

IEEE Computational Science & Engineering
Changing Interaction of Compiler and Architecture

Computer
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance

IEEE Transactions on Computers
Compiler Techniques for Effective Communication on Distributed-Memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Is OpenMP for Grids?

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
SmartApps: An Application Centric Approach to High Performance Computing: Compiler-Assisted Software and Hardware Support for Reduction Operations

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The R-LRPD Test: Speculative Parallelization of Partially Parallel Loops

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Efficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Data Locality Exploitation in Algorithms including Sparse Communications

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Principles of Speculative Run-Time Parallelization

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Beyond Arrays - A Container-Centric Approach for Parallelization of Real-World Symbolic Applications

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
The Access Region Test

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Run-Time Parallelization Optimization Techniques

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Compiling for Speculative Architectures

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
An Automatic Iteration/Data Distribution Method Based on Access Descriptors for DSMM

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Compile-Time Based Performance Prediction

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
A Performance Advisor Tool for Shared-Memory Parallel Programming

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Towards Detection of Coarse-Grain Loop-Level Parallelism in Irregular Computations

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Portable Compilers for OpenMP

WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Techniques for Reducing the Overhead of Run-Time Parallelization

CC '00 Proceedings of the 9th International Conference on Compiler Construction
On the Automatic Parallelization of Sparse and Irregular Fortran Programs

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
A Case for Combining Compile-Time and Run-Time Parallelization

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Locality Enhancement for Large-Scale Shared-Memory Multiprocessors

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Overlap of computation and communication on shared-memory networks-of-workstations

Cluster computing
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A GSA-based compiler infrastructure to extract parallelism from complex loops

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The impact of data dependence analysis on compilation and program parallelization

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Tradeoffs in Buffering Memory State for Thread-Level Speculation in Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
ADAPT: Automated De-Coupled Adaptive Program Transformation

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
A Clustered Approach to Multithreaded Processors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Experimental Study of Compiler Techniques for NUMA Machines

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Compiler Techniques for the Distribution of Data and Computation

IEEE Transactions on Parallel and Distributed Systems
Run-Time Support for the Automatic Parallelization of Java Programs

The Journal of Supercomputing
BLOB computing

Proceedings of the 1st conference on Computing frontiers
A compiler tool to predict memory hierarchy performance of scientific codes

Parallel Computing
Hybrid analysis: static & dynamic memory reference analysis

International Journal of Parallel Programming
A Multilevel Computing Architecture for Embedded Multimedia Applications

IEEE Micro
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
Adaptive execution techniques for SMT multiprocessor architectures

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel techniques in irregular codes: cloth simulation as case of study

Journal of Parallel and Distributed Computing
Interprocedural parallelization analysis in SUIF

ACM Transactions on Programming Languages and Systems (TOPLAS)
A methodology for detailed performance modeling of reduction computations on SMP machines

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Computer Architecture: Challenges and Opportunities for the Next Decade

IEEE Micro
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Tradeoffs in buffering speculative memory state for thread-level speculation in multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Data dependence analysis techniques for increased accuracy and extracted parallelism

International Journal of Parallel Programming - Special issue II: The 17th annual international conference on supercomputing (ICS'03)
Analytical modeling of codes with arbitrary data-dependent conditional structures

Journal of Systems Architecture: the EUROMICRO Journal
On the parallelization of irregular and dynamic programs

Parallel Computing
Region array SSA

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
An empirical evaluation of chains of recurrences for array dependence testing

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Exploiting reference idempotency to reduce speculative storage overflow

ACM Transactions on Programming Languages and Systems (TOPLAS)
An Adaptive Algorithm Selection Framework for Reduction Parallelization

IEEE Transactions on Parallel and Distributed Systems
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Combining compile-time and run-time parallelization[1]

Scientific Programming
Parallel programming environment for OpenMP

Scientific Programming
Software behavior oriented parallelization

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Sensitivity analysis for automatic parallelization on multi-cores

Proceedings of the 21st annual international conference on Supercomputing
A network-computing infrastructure for tool experimentation applied to computer architecture education

WCAE '00 Proceedings of the 2000 workshop on Computer architecture education
Precise automatable analytical modeling of the cache behavior of codes with indirections

ACM Transactions on Architecture and Code Optimization (TACO)
Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions

International Journal of Computational Science and Engineering
An analytical model of locality-based parallel irregular reductions

Parallel Computing
Compiler and hardware support for reducing the synchronization of speculative threads

ACM Transactions on Architecture and Code Optimization (TACO)
XARK: An extensible framework for automatic recognition of computational kernels

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficiently Building the Gated Single Assignment Form in Codes with Pointers in Modern Optimizing Compilers

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Flow-Sensitive Loop-Variant Variable Classification in Linear Time

Languages and Compilers for Parallel Computing
Commutativity analysis for software parallelization: letting program transformations see the big picture

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
A translation system for enabling data mining applications on GPUs

Proceedings of the 23rd international conference on Supercomputing
Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
The use of hardware transactional memory for the trace-based parallelization of recursive Java programs

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Adaptive execution techniques of parallel programs for multiprocessors

Journal of Parallel and Distributed Computing
Can transactions enhance parallel programs?

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Exploiting speculative thread-level parallelism in data compression applications

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
OpenMP and compilation issue in embedded applications

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Supporting realistic OpenMP applications on a commodity cluster of workstations

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Compiler and middleware support for scalable data mining

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
A compiler framework to detect parallelism in irregular codes

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
The structure of a compiler for explicit and implicit parallelism

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Induction variable analysis without idiom recognition: beyond monotonicity

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
A modular and extensible macroprogramming compiler

Proceedings of the 2010 ICSE Workshop on Software Engineering for Sensor Network Applications
On the interaction of tiling and automatic parallelization

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Automatic Parallelization in a Binary Rewriter

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Kremlin: rethinking and rebooting gprof for the multicore age

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

Proceedings of the international conference on Supercomputing
Performance analysis and tuning of automatically parallelized OpenMP applications

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Scalable array SSA and array data flow analysis

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Interprocedural symbolic range propagation for optimizing compilers

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A user-guided semi-automatic parallelization method and its implementation

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Towards a versatile pointer analysis framework

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Parallel reductions: an application of adaptive algorithm selection

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Adaptively increasing performance and scalability of automatically parallelized programs

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Compiler and runtime support for shared memory parallelization of data mining algorithms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Performance analysis of symbolic analysis techniques for parallelizing compilers

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatic scoping of variables in parallel regions of an OpenMP program

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
An evaluation of auto-scoping in OpenMP

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
Automatically tuning parallel and parallelized programs

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
OSCAR API for real-time low-power multicores and its performance on multicores and SMP servers

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Paragon: collaborative speculative loop execution on GPU and CPU

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Logical inference techniques for loop parallelization

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
HydraVM: extracting parallelism from legacy sequential code using STM

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Portable section-level tuning of compiler parallelized applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
From serial loops to parallel execution on distributed systems

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Skeletal based programming for dynamic programming on MultiGPU systems

The Journal of Supercomputing
The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation

International Journal of Parallel Programming
Leveraging GPUs using cooperative loop speculation

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	4.11

Visualization

Abstract

As we reach the technological limits of hardware improvement, we must rely on multiple processors to improve programming speed. Parallel programming tools are limited, making effective parallel programming difficult and cumbersome. Compilers that translate conventional sequential programs into parallel form would liberate programmers from the complexities of explicit, machine-oriented parallel programming. Polaris, an experimental translator of conventional Fortran programs that target machines such as the Cray T3D, is the first step toward this goal. The most important techniques implemented in Polaris resulted from a study of the effectiveness of commercial Fortran parallelizers. The authors compiled the Perfect Benchmarks, a collection of conventional Fortran programs representing the typical workload of high-performance computers, for the Alliant FX/80, an eight-processor multiprocessor popular in the late 1980s. For each program, they measured the quality of the parallelization by computing the speedup. With few exceptions, the Alliant Fortran compiler failed to deliver any significant speedup for the majority of the programs. The compiler failed to produce a speedup because it could not parallelize some of the most important loops in the Perfect Benchmarks. The study showed that extending the four most important analysis and transformation techniques traditionally used for vectorization leads to significant increases in speedup. Polaris detected much of the parallelism available in the set of benchmark codes. A careful analysis of the remaining loops that Polaris could parallelize highlights four areas for improvement.