Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
The high performance Fortran handbook
The high performance Fortran handbook
PARADIGM: a compiler for automatic data distribution on multicomputers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Runtime compilation techniques for data partitioning and communication schedule reuse
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Exploiting spatial regularity in irregular iterative applications
IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A flexible operation execution model for shared distributed objects
Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compiler support for hybrid irregular accesses on multicomputers
ICS '96 Proceedings of the 10th international conference on Supercomputing
Automating parallel runtime optimizations using post-mortem analysis
ICS '96 Proceedings of the 10th international conference on Supercomputing
Integrating task and data parallelism using shared objects
ICS '96 Proceedings of the 10th international conference on Supercomputing
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An efficient uniform run-time scheme for mixed regular-irregular applications
ICS '98 Proceedings of the 12th international conference on Supercomputing
Support for Efficient Programming on the SB-PRAM
International Journal of Parallel Programming
A task- and data-parallel programming language based on shared objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
A coordination language for mixed task and and data parallel programs
Proceedings of the 1999 ACM symposium on Applied computing
A global communication optimization technique based on data-flow analysis and linear algebra
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler and Run-Time Support for Exploiting Regularity within Irregular Applications
IEEE Transactions on Parallel and Distributed Systems
A Transformation Approach to Derive Efficient Parallel Implementations
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Deriving Array Distributions by Optimization Techniques
The Journal of Supercomputing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
Minimizing Data and Synchronization Costs in One-Way Communication
IEEE Transactions on Parallel and Distributed Systems
A comparative study of the NAS MG benchmark across parallel languages and architectures
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Global optimization techniques for automatic parallelization of hybrid applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Static Single Assignment Form for Message-Passing Programs
International Journal of Parallel Programming
Compiler optimization of dynamic data distributions for distributed-memory multicomputers
Compiler optimizations for scalable parallel systems
A framework for global communication analysis of optimizations
Compiler optimizations for scalable parallel systems
Advanced code generation for high performance Fortran
Compiler optimizations for scalable parallel systems
Integer lattice based methods for local address generation for block-cyclic distributions
Compiler optimizations for scalable parallel systems
High performance Fortran compilation techniques for parallelizing scientific codes
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
IEEE Parallel & Distributed Technology: Systems & Technology
Approaches for Integrating Task and Data Parallelism
IEEE Concurrency
A Parallelization Domain Oriented Multilevel Graph Partitioner
IEEE Transactions on Computers
Algorithms for Supporting Compiled Communication
IEEE Transactions on Parallel and Distributed Systems
Exploiting task and data parallelism in parallel Hough and Radon transforms
ICPP '97 Proceedings of the international Conference on Parallel Processing
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
On Privatization of Variables for Data-Parallel Execution
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Exploiting Ownership Sets in HPF
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing
The Journal of Supercomputing
Compiler-assisted generation of error-detecting parallel programs
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Load Balancing HPF programs by Migrating Virtual Processors
HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Efficient support for pipelining in software distributed shared memory systems
Real-time system security
Linear data distribution based on index analysis
High performance scientific and engineering computing
Quasidynamic Layout Optimizations for Improving Data Locality
IEEE Transactions on Parallel and Distributed Systems
Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Object-Distribution Analysis for Program Decomposition and Re-Clustering
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
An MPI prototype for compiled communication on Ethernet switched clusters
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Compiler-directed channel allocation for saving power in on-chip networks
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
VFC: The Vienna Fortran Compiler
Scientific Programming
Memetic algorithms for parallel code optimization
International Journal of Parallel Programming
Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
Integration, the VLSI Journal
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
Slicing based code parallelization for minimizing inter-processor communication
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Scalable computing with parallel tasks
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
A modular and extensible macroprogramming compiler
Proceedings of the 2010 ICSE Workshop on Software Engineering for Sensor Network Applications
Experiences in using cetus for source-to-source transformations
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Embedded Systems Design
Hirundo: a mechanism for automated production of optimized data stream graphs
ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Finding, expressing and managing parallelism in programs executed on clusters of workstations
Computer Communications
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automatic optimization of stream programs via source program operator graph transformations
Distributed and Parallel Databases
Combined scheduling and mapping for scalable computing with parallel tasks
Scientific Programming - Biological Knowledge Discovery and Data Mining
Hi-index | 4.11 |
The authors describe a flexible compiler framework for distributed-memory multicomputers, called Paradigm (Parallelizing Compiler for Distributed-Memory, General-Purpose Multicomputers). To extract computational power from a multicomputer, users must often expend significant time and energy to write efficient software. Paradigm addresses this problem by automatically parallelizing sequential programs. Besides handling traditional compiler optimizations, Paradigm focuses on several other areas within a unified platform. These include automatic data distribution, communication optimizations, support for irregular computations, exploitation of functional and data parallelism, and multithreaded execution. Automatic data partitioning involves several choices. These include array alignment, distribution (block or cyclic), block size, and mesh configuration. Paradigm addresses these decisions in distinct phases. The compiler supports both regular and irregular computations. For regular computations, the compiler uses efficient processor-tagged descriptors to handle the simplest and most frequent cases. It uses more general, inequality-based representations for the difficult cases. This lets Paradigm compile a larger proportion of programs without jeopardizing compilation speed. In addition, to reduce the overhead caused by frequent communications, the compiler employs message coalescing, message verification, message aggregation, and coarse-grain pipelining. For irregular computations, Paradigm uses two sequences of code: an inspector for preprocessing and an executor for performing the actual computations. The Parallel Irregular Library with Application of Regularity (PILAR) provides Paradigm's irregular runtime support. Finally, the authors describe how Paradigm uses functional and data parallelism and multithreading to improve overall execution efficiency.