Maximizing Multiprocessor Performance with the SUIF Compiler

Authors:
Mary W. Hall;Jennifer M. Anderson;Saman P. Amarasinghe;Brian R. Murphy;Shih-Wei Liao;Edouard Bugnion;Monica S. Lam
Affiliations:
-;-;-;-;-;-;-
Venue:
Computer
Year:
1996

Citing 9
Cited 221

Semantical interprocedural parallelization: an overview of the PIPS project

ICS '91 Proceedings of the 5th international conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Automatic data layout for high performance Fortran

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Symbolic analysis for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler-directed page coloring for multiprocessors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Interprocedural symbolic analysis

Interprocedural symbolic analysis

Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
Continuous profiling: where have all the cycles gone?

Proceedings of the sixteenth ACM symposium on Operating systems principles
Tuning compiler optimizations for simultaneous multithreading

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Measuring the effectiveness of automatic parallelization in SUIF

ICS '98 Proceedings of the 12th international conference on Supercomputing
Active pages: a computation model for intelligent memory

Proceedings of the 25th annual international symposium on Computer architecture
A performance analysis environment for life

SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
The architecture of Montana: an open and extensible programming environment with an incremental C++ compiler

SIGSOFT '98/FSE-6 Proceedings of the 6th ACM SIGSOFT international symposium on Foundations of software engineering
Procedure cloning: a transformation for improved system-level functional partitioning

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Maps: a compiler-managed memory system for raw machines

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Evaluation of predicated array data-flow analysis for automatic parallelization

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors

Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
A graphic parallelizing environment for user-compiler interaction

ICS '99 Proceedings of the 13th international conference on Supercomputing
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors

ICS '99 Proceedings of the 13th international conference on Supercomputing
Increasing effective IPC by exploiting distant parallelism

ICS '99 Proceedings of the 13th international conference on Supercomputing
A DAG-based design approach for reconfigurable VLIW processors

DATE '99 Proceedings of the conference on Design, automation and test in Europe
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Exploiting ILP in page-based intelligent memory

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors

IEEE Transactions on Parallel and Distributed Systems
Evaluating Automatic Parallelization in SUIF

IEEE Transactions on Parallel and Distributed Systems
A programmable preprocessor for parallelizing Fortran-90

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
An annotation language for optimizing software libraries

Proceedings of the 2nd conference on Domain-specific languages
A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors

Proceedings of the 14th international conference on Supercomputing
Efficient Interprocedural Array Data-Flow Analysis for Automatic Program Parallelization

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Tuning Compiler Optimizations for Simultaneous Multithreading

International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
The Design of the PROMIS Compiler—Towards Multi-Level Parallelization

International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
Memory arbitration and cache management in stream-based systems

DATE '00 Proceedings of the conference on Design, automation and test in Europe
How to solve the current memory access and data transfer bottlenecks: at the processor architecture or at the compiler level

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Maximal Static Expansion

International Journal of Parallel Programming
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
A Unified Symbolic Evaluation Framework for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
A Specification Refinement Methodology for Power Efficient Partitioning of Data-Dominated Algorithms Within Performance Constraints

Journal of VLSI Signal Processing Systems
Architectural support for scalable speculative parallelization in shared-memory multiprocessors

Proceedings of the 27th annual international symposium on Computer architecture
A decade of reconfigurable computing: a visionary retrospective

Proceedings of the conference on Design, automation and test in Europe
Towards an integrated, web-executable parallel programming tool environment

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Coarse grain reconfigurable architecture (embedded tutorial)

Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor

ICS '01 Proceedings of the 15th international conference on Supercomputing
Reference idempotency analysis: a framework for optimizing speculative execution

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Accurate data redistribution cost estimation in software distributed shared memory systems

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Source code optimization and profiling of energy consumption in embedded systems

ISSS '00 Proceedings of the 13th international symposium on System synthesis
The very portable optimizer for digital signal processors

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-Directed Collective-I/O

IEEE Transactions on Parallel and Distributed Systems
Compiler Support for Scalable and Efficient Memory Systems

IEEE Transactions on Computers
Complex library mapping for embedded software using symbolic algebra

Proceedings of the 39th annual Design Automation Conference
Hybrid analysis: static & dynamic memory reference analysis

ICS '02 Proceedings of the 16th international conference on Supercomputing
A general compiler framework for speculative multithreading

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Partitioning sequential programs for CAD using a three-step approach

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Instruction generation and regularity extraction for reconfigurable processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Instruction generation for hybrid reconfigurable systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Parallel Programming and Performance Evaluation with the URSA Tool Family

International Journal of Parallel Programming
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors

International Journal of Parallel Programming
A Single-Chip Multiprocessor

Computer
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
The Stanford Hydra CMP

IEEE Micro
SDAARC: An Extended Cache-Only Memory Architecture

IEEE Micro
Exploiting Data Value Prediction in Compiler Based Thread Formation

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Charon Message-Passing Toolkit for Scientific Computations

HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
High Level Software Synthesis of Affine Iterative Algorithms onto Parallel Architectures

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
A Compiler Infrastructure for High-Performance Java

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Efficient Pipelining of Nested Loops: Unroll-and-Squash

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Efficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Principles of Speculative Run-Time Parallelization

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Beyond Arrays - A Container-Centric Approach for Parallelization of Real-World Symbolic Applications

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Compiling for Speculative Architectures

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Coarse-Grain Task Parallel Processing Using the OpenMP Backend of the OSCAR Multigrain Parallelizing Compiler

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Condensed Graphs: A Multi-level, Parallel, Intermediate Representation

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Parallel Computation: MM +/- X

Informatics - 10 Years Back. 10 Years Ahead.
Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems

CC '02 Proceedings of the 11th International Conference on Compiler Construction
On Availability of Bit-Narrow Operations in General-Purpose Applications

FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
A Case for Combining Compile-Time and Run-Time Parallelization

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
MARS: A Distributed Memory Approach to Shared Memory Compilation

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Locality Enhancement for Large-Scale Shared-Memory Multiprocessors

LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Optimizing Mutual Exclusion Synchronization in Explicitly Parallel Programs

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A systematic methodology for the application of data transfer and storage optimizing code transformations for power consumption and execution time reduction in realizations of multimedia algorithms on programmable processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Data communication estimation and reduction for reconfigurable systems

Proceedings of the 40th annual Design Automation Conference
TEST: a tracer for extracting speculative threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Advanced copy propagation for arrays

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
A compiler approach for reducing data cache energy

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Parallelizing Applications into Silicon

FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Task Graph Extraction for Embedded System Synthesis

VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Reducing Cost and Tolerating Defects in Page-based Intelligent Memory

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
A Clustered Approach to Multithreaded Processors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Efficient support for pipelining in software distributed shared memory systems

Real-time system security
Compiler parallelization of C programs for multi-core DSPs with multiple address spaces

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Programming challenges in network processor deployment

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Run-Time Support for the Automatic Parallelization of Java Programs

The Journal of Supercomputing
Combined Application of Data Transfer and Storage Optimizing Transformations and Subword Parallelism Exploitation for Power Consumption and Execution Time Reduction in VLIW Multimedia Processors

Journal of VLSI Signal Processing Systems
SAGE: an automatic analyzing system for a new high-performance SoC architecture-processor-in-memory

Journal of Systems Architecture: the EUROMICRO Journal
A general framework for prefetch scheduling in linked data structures and its application to multi-chain prefetching

ACM Transactions on Computer Systems (TOCS)
Hybrid analysis: static & dynamic memory reference analysis

International Journal of Parallel Programming
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance

IEEE Transactions on Knowledge and Data Engineering
A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems

IEEE Transactions on Parallel and Distributed Systems
Superword-Level Parallelism in the Presence of Control Flow

Proceedings of the international symposium on Code generation and optimization
A Partitioning Methodology for Accelerating Applications in Hybrid Reconfigurable Platforms

Proceedings of the conference on Design, Automation and Test in Europe - Volume 3
An Application Analysis Framework For Polymorphic Chip Multiprocessors

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Software-Directed Disk Power Management for Scientific Applications

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A Compiler Method for Memory-Conscious Mapping of Applications on Coarse-Grained Reconfigurable Architectures

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Power and Performance in I/O for Scientific Applications

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems

International Journal of High Performance Computing Applications
Energy management in software-controlled multi-level memory hierarchies

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Probabilistic source-level optimisation of embedded programs

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Interprocedural parallelization analysis in SUIF

ACM Transactions on Programming Languages and Systems (TOPLAS)
An evaluation of code and data optimizations in the context of disk power reduction

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A methodology for detailed performance modeling of reduction computations on SMP machines

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Reducing data cache leakage energy using a compiler-based approach

ACM Transactions on Embedded Computing Systems (TECS)
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Disk layout optimization for reducing energy consumption

Proceedings of the 19th annual international conference on Supercomputing
A unified theory of timing budget management

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
High-level synthesis using computation-unit integrated memories

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Energy-aware computation duplication for improving reliability in embedded chip multiprocessors

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
SCMP: a single-chip message-passing parallel computer

The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors

Proceedings of the International Symposium on Code Generation and Optimization
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Probabilistic Delay Budgeting for Soft Realtime Applications

ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Energy-aware data prefetching for multi-speed disks

Proceedings of the 3rd conference on Computing frontiers
Fast timing closure by interconnect criticality driven delay relaxation

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Power-aware compilation for embedded processors with dynamic voltage scaling and adaptive body biasing capabilities

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Predictive search distributions

ICML '06 Proceedings of the 23rd international conference on Machine learning
Leakage-aware intraprogram voltage scaling for embedded processors

Proceedings of the 43rd annual Design Automation Conference
Exploiting reference idempotency to reduce speculative storage overflow

ACM Transactions on Programming Languages and Systems (TOPLAS)
Source level transformations to improve I/O data partitioning

SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Reducing power through compiler-directed barrier synchronization elimination

Proceedings of the 2006 international symposium on Low power electronics and design
Automatic performance model construction for the fast software exploration of new hardware designs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Compiler optimization of embedded applications for an adaptive SoC architecture

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread

Microprocessors & Microsystems
Cache oblivious algorithms for nonserial polyadic programming

The Journal of Supercomputing
Cache miss clustering for banked memory systems

Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Design space exploration of an optimized compiler approach for a generic reconfigurable array architecture

The Journal of Supercomputing
Combining compile-time and run-time parallelization[1]

Scientific Programming
Fast compiler optimisation evaluation using code-feature based performance prediction

Proceedings of the 4th international conference on Computing frontiers
A unified evaluation framework for coarse grained reconfigurable array architectures

Proceedings of the 4th international conference on Computing frontiers
A compiler cost model for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO)
Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Memory bank aware dynamic loop scheduling

Proceedings of the conference on Design, automation and test in Europe
An annotation language for optimizing software libraries

DSL'99 Proceedings of the 2nd conference on Conference on Domain-Specific Languages - Volume 2
Typed common intermediate format

DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Optimization of data prefetch helper threads with path-expression based statistical modeling

Proceedings of the 21st annual international conference on Supercomputing
An ilp based approach to reducing energy consumption in nocbased CMPS

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
SPRINT: a tool to generate concurrent transaction-level models from sequential code

EURASIP Journal on Applied Signal Processing
Hierarchical coarse-grained stream compilation for software defined radio

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-Directed Energy Optimization for Parallel Disk Based Systems

IEEE Transactions on Parallel and Distributed Systems
Energy-optimizing source code transformations for operating system-driven embedded software

ACM Transactions on Embedded Computing Systems (TECS)
Program optimization space pruning for a multithreaded gpu

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Software-cooperative power-efficient heterogeneous multi-core for media processing

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions

International Journal of Computational Science and Engineering
An analytical model of locality-based parallel irregular reductions

Parallel Computing
Improving I/O performance of applications through compiler-directed code restructuring

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Language Extensions in Support of Compiler Parallelization

Languages and Compilers for Parallel Computing
Critical Block Scheduling: A Thread-Level Parallelizing Mechanism for a Heterogeneous Chip Multiprocessor Architecture

Languages and Compilers for Parallel Computing
Automatic Discovery of Coarse-Grained Parallelism in Media Applications

Transactions on High-Performance Embedded Architectures and Compilers I
A Prefetching Algorithm for Multi-speed Disks

Transactions on High-Performance Embedded Architectures and Compilers I
SPM management using Markov chain based data access prediction

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Resource aware mapping on coarse grained reconfigurable arrays

Microprocessors & Microsystems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays

The Journal of Supercomputing
A translation system for enabling data mining applications on GPUs

Proceedings of the 23rd international conference on Supercomputing
Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Evaluating compiler technology for control-flow optimizations for multimedia extension architectures

Microprocessors & Microsystems
Stream Compilation for Real-Time Embedded Multicore Systems

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Optimizing performance of parallel programs on multicomputer and multi-core architectures: a comparative evaluation

Proceedings of the 2009 conference on Information Science, Technology and Applications
The use of hardware transactional memory for the trace-based parallelization of recursive Java programs

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Modern development methods and tools for embedded reconfigurable systems: A survey

Integration, the VLSI Journal
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive execution techniques of parallel programs for multiprocessors

Journal of Parallel and Distributed Computing
Compiler directed network-on-chip reliability enhancement for chip multiprocessors

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Can transactions enhance parallel programs?

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Quantifying uncertainty in points-to relations

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
OpenMP and compilation issue in embedded applications

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Advanced symbolic analysis for compilers: new techniques and algorithms for symbolic program analysis and optimization

Advanced symbolic analysis for compilers: new techniques and algorithms for symbolic program analysis and optimization
The structure of a compiler for explicit and implicit parallelism

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Coarse grain task parallel processing with cache optimization on shared memory multiprocessor

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Toward to utilize the heterogeneous multiple processors of the chip multiprocessor architecture

EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Performance evaluation of compiler controlled power saving scheme

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
A modular and extensible macroprogramming compiler

Proceedings of the 2010 ICSE Workshop on Software Engineering for Sensor Network Applications
A profile-based tool for finding pipeline parallelism in sequential programs

Parallel Computing
Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming

CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatic Parallelization in a Binary Rewriter

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic delay budget assignment for synthesis of soft real-time applications

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Exploiting the distributed foreground memory in coarse grain reconfigurable arrays for reducing the memory bottleneck in DSP applications

SSIP'05 Proceedings of the 5th WSEAS international conference on Signal, speech and image processing
Kremlin: rethinking and rebooting gprof for the multicore age

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
ALTER: exploiting breakable dependences for parallelization

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
An execution strategy and optimized runtime support for parallelizing irregular reductions on modern GPUs

Proceedings of the international conference on Supercomputing
HAWKEYE: effective discovery of dataflow impediments to parallelization

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
The varying faces of a program transformation systems

ACM Inroads
A language for the compact representation of multiple program versions

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Loop selection for thread-level speculation

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Compiler control power saving scheme for multi core processors

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Parallelizing user-defined and implicit reductions globally on multiprocessors

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Automatic parallelization using the value evolution graph

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Low power engineering

Embedded Systems Design
Hierarchical parallelism control for multigrain parallel processing

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Compiler and runtime support for shared memory parallelization of data mining algorithms

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatic scoping of variables in parallel regions of an OpenMP program

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
An evaluation of auto-scoping in OpenMP

WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
OSCAR API for real-time low-power multicores and its performance on multicores and SMP servers

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms

Journal of Signal Processing Systems
HydraVM: extracting parallelism from legacy sequential code using STM

HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A framework for end-to-end verification and evaluation of register allocators

SAS'07 Proceedings of the 14th international conference on Static Analysis
From serial loops to parallel execution on distributed systems

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Designer-in-the-loop recoding of ESL models using static parallel access conflict analysis

Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
Parallelizing Sequential Programs with Statistical Accuracy Tests

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
CUBIT: compact bitmap profiling for dynamic data dependence analysis

Proceedings of the 2013 Research in Adaptive and Convergent Systems
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Palirria: Accurate On-line Parallelism Estimation for Adaptive Work-Stealing

Proceedings of Programming Models and Applications on Multicores and Manycores
Integrating profile-driven parallelism detection and machine-learning-based mapping

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	4.11

Visualization

Abstract

Multiple processors can work together to speed up single applications, but sequential programs must be rewritten to take advantage of the extra processors. One way to do this is through automatic parallelization with a compiler. Multiprocessors pose especially challenging problems for parallelizing compilers. Sufficient work must be performed in parallel to overcome processor synchronization and communication overhead. Moreover, multiprocessor memory hierarchies are complex, containing both shared memory and multiple levels of cache memory. Thus, two techniques are essential in obtaining good multiprocessor performance for array-based numerical programs: locating coarse-grain parallelism and managing multiprocessor memory use. The authors describe new technology in the Stanford SUIF compiler that enables it to successfully carry out these techniques. First, a suite of robust analysis techniques operate across procedure boundaries to locate coarse-grain parallelism so that large computations can execute independently in parallel. Then, to help eliminate cache misses, affine partitioning is used to improve processor reuse of data, and data permutation and data strip-mining make contiguous the data accessed by each processor in the shared address space. When employed in the automatic parallelizing compiler, these techniques significantly affect the performance of half the NAS and SPECfp95 benchmark suites.