Semantical interprocedural parallelization: an overview of the PIPS project
ICS '91 Proceedings of the 5th international conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Symbolic analysis for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler-directed page coloring for multiprocessors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Interprocedural symbolic analysis
Interprocedural symbolic analysis
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Measuring the effectiveness of automatic parallelization in SUIF
ICS '98 Proceedings of the 12th international conference on Supercomputing
Active pages: a computation model for intelligent memory
Proceedings of the 25th annual international symposium on Computer architecture
A performance analysis environment for life
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
SIGSOFT '98/FSE-6 Proceedings of the 6th ACM SIGSOFT international symposium on Foundations of software engineering
Procedure cloning: a transformation for improved system-level functional partitioning
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Maps: a compiler-managed memory system for raw machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Evaluation of predicated array data-flow analysis for automatic parallelization
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
A graphic parallelizing environment for user-compiler interaction
ICS '99 Proceedings of the 13th international conference on Supercomputing
Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors
ICS '99 Proceedings of the 13th international conference on Supercomputing
Increasing effective IPC by exploiting distant parallelism
ICS '99 Proceedings of the 13th international conference on Supercomputing
A DAG-based design approach for reconfigurable VLIW processors
DATE '99 Proceedings of the conference on Design, automation and test in Europe
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Exploiting ILP in page-based intelligent memory
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Software-Directed Register Deallocation for Simultaneous Multithreaded Processors
IEEE Transactions on Parallel and Distributed Systems
Evaluating Automatic Parallelization in SUIF
IEEE Transactions on Parallel and Distributed Systems
A programmable preprocessor for parallelizing Fortran-90
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Mapping irregular applications to DIVA, a PIM-based data-intensive architecture
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
An annotation language for optimizing software libraries
Proceedings of the 2nd conference on Domain-specific languages
Proceedings of the 14th international conference on Supercomputing
Efficient Interprocedural Array Data-Flow Analysis for Automatic Program Parallelization
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Tuning Compiler Optimizations for Simultaneous Multithreading
International Journal of Parallel Programming - Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II
The Design of the PROMIS Compiler—Towards Multi-Level Parallelization
International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
Memory arbitration and cache management in stream-based systems
DATE '00 Proceedings of the conference on Design, automation and test in Europe
DATE '00 Proceedings of the conference on Design, automation and test in Europe
International Journal of Parallel Programming
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs
International Journal of Parallel Programming
A Unified Symbolic Evaluation Framework for Parallelizing Compilers
IEEE Transactions on Parallel and Distributed Systems
Journal of VLSI Signal Processing Systems
Architectural support for scalable speculative parallelization in shared-memory multiprocessors
Proceedings of the 27th annual international symposium on Computer architecture
A decade of reconfigurable computing: a visionary retrospective
Proceedings of the conference on Design, automation and test in Europe
Towards an integrated, web-executable parallel programming tool environment
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Coarse grain reconfigurable architecture (embedded tutorial)
Proceedings of the 2001 Asia and South Pacific Design Automation Conference
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Reference idempotency analysis: a framework for optimizing speculative execution
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Accurate data redistribution cost estimation in software distributed shared memory systems
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Source code optimization and profiling of energy consumption in embedded systems
ISSS '00 Proceedings of the 13th international symposium on System synthesis
The very portable optimizer for digital signal processors
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-Directed Collective-I/O
IEEE Transactions on Parallel and Distributed Systems
Compiler Support for Scalable and Efficient Memory Systems
IEEE Transactions on Computers
Complex library mapping for embedded software using symbolic algebra
Proceedings of the 39th annual Design Automation Conference
Hybrid analysis: static & dynamic memory reference analysis
ICS '02 Proceedings of the 16th international conference on Supercomputing
A general compiler framework for speculative multithreading
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Partitioning sequential programs for CAD using a three-step approach
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Instruction generation and regularity extraction for reconfigurable processors
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Instruction generation for hybrid reconfigurable systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Parallel Programming and Performance Evaluation with the URSA Tool Family
International Journal of Parallel Programming
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors
International Journal of Parallel Programming
Computer
IEEE Micro
Exploiting Data Value Prediction in Compiler Based Thread Formation
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Charon Message-Passing Toolkit for Scientific Computations
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
High Level Software Synthesis of Affine Iterative Algorithms onto Parallel Architectures
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
A Compiler Infrastructure for High-Performance Java
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Efficient Pipelining of Nested Loops: Unroll-and-Squash
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Efficient Parallelization of Unstructured Reductions on Shared Memory Parallel Architectures
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Principles of Speculative Run-Time Parallelization
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Beyond Arrays - A Container-Centric Approach for Parallelization of Real-World Symbolic Applications
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Compiling for Speculative Architectures
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Automatic Coarse Grain Task Parallel Processing on SMP Using OpenMP
LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Condensed Graphs: A Multi-level, Parallel, Intermediate Representation
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
On Automatic Parallelization of Irregular Reductions on Scalable Shared Memory Systems
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Parallel Computation: MM +/- X
Informatics - 10 Years Back. 10 Years Ahead.
Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems
CC '02 Proceedings of the 11th International Conference on Compiler Construction
On Availability of Bit-Narrow Operations in General-Purpose Applications
FPL '00 Proceedings of the The Roadmap to Reconfigurable Computing, 10th International Workshop on Field-Programmable Logic and Applications
A Case for Combining Compile-Time and Run-Time Parallelization
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
MARS: A Distributed Memory Approach to Shared Memory Compilation
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Locality Enhancement for Large-Scale Shared-Memory Multiprocessors
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Optimizing Mutual Exclusion Synchronization in Explicitly Parallel Programs
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Data communication estimation and reduction for reconfigurable systems
Proceedings of the 40th annual Design Automation Conference
TEST: a tracer for extracting speculative threads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Advanced copy propagation for arrays
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
A compiler approach for reducing data cache energy
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Parallelizing Applications into Silicon
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Task Graph Extraction for Embedded System Synthesis
VLSID '03 Proceedings of the 16th International Conference on VLSI Design
Reducing Cost and Tolerating Defects in Page-based Intelligent Memory
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
A Clustered Approach to Multithreaded Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Efficient support for pipelining in software distributed shared memory systems
Real-time system security
Compiler parallelization of C programs for multi-core DSPs with multiple address spaces
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Programming challenges in network processor deployment
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Run-Time Support for the Automatic Parallelization of Java Programs
The Journal of Supercomputing
SAGE: an automatic analyzing system for a new high-performance SoC architecture-processor-in-memory
Journal of Systems Architecture: the EUROMICRO Journal
ACM Transactions on Computer Systems (TOCS)
Hybrid analysis: static & dynamic memory reference analysis
International Journal of Parallel Programming
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
IEEE Transactions on Knowledge and Data Engineering
A Complete Compiler Approach to Auto-Parallelizing C Programs for Multi-DSP Systems
IEEE Transactions on Parallel and Distributed Systems
Superword-Level Parallelism in the Presence of Control Flow
Proceedings of the international symposium on Code generation and optimization
A Partitioning Methodology for Accelerating Applications in Hybrid Reconfigurable Platforms
Proceedings of the conference on Design, Automation and Test in Europe - Volume 3
An Application Analysis Framework For Polymorphic Chip Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Software-Directed Disk Power Management for Scientific Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Power and Performance in I/O for Scientific Applications
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems
International Journal of High Performance Computing Applications
Energy management in software-controlled multi-level memory hierarchies
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Probabilistic source-level optimisation of embedded programs
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Interprocedural parallelization analysis in SUIF
ACM Transactions on Programming Languages and Systems (TOPLAS)
An evaluation of code and data optimizations in the context of disk power reduction
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
A methodology for detailed performance modeling of reduction computations on SMP machines
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Reducing data cache leakage energy using a compiler-based approach
ACM Transactions on Embedded Computing Systems (TECS)
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
Disk layout optimization for reducing energy consumption
Proceedings of the 19th annual international conference on Supercomputing
A unified theory of timing budget management
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
High-level synthesis using computation-unit integrated memories
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Energy-aware computation duplication for improving reliability in embedded chip multiprocessors
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
SCMP: a single-chip message-passing parallel computer
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Using Machine Learning to Focus Iterative Optimization
Proceedings of the International Symposium on Code Generation and Optimization
Probabilistic Delay Budgeting for Soft Realtime Applications
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Energy-aware data prefetching for multi-speed disks
Proceedings of the 3rd conference on Computing frontiers
Fast timing closure by interconnect criticality driven delay relaxation
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Predictive search distributions
ICML '06 Proceedings of the 23rd international conference on Machine learning
Leakage-aware intraprogram voltage scaling for embedded processors
Proceedings of the 43rd annual Design Automation Conference
Exploiting reference idempotency to reduce speculative storage overflow
ACM Transactions on Programming Languages and Systems (TOPLAS)
Source level transformations to improve I/O data partitioning
SNAPI '03 Proceedings of the international workshop on Storage network architecture and parallel I/Os
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
Reducing power through compiler-directed barrier synchronization elimination
Proceedings of the 2006 international symposium on Low power electronics and design
Automatic performance model construction for the fast software exploration of new hardware designs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Compiler optimization of embedded applications for an adaptive SoC architecture
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread
Microprocessors & Microsystems
Cache oblivious algorithms for nonserial polyadic programming
The Journal of Supercomputing
Cache miss clustering for banked memory systems
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
The Journal of Supercomputing
Combining compile-time and run-time parallelization[1]
Scientific Programming
Fast compiler optimisation evaluation using code-feature based performance prediction
Proceedings of the 4th international conference on Computing frontiers
A unified evaluation framework for coarse grained reconfigurable array architectures
Proceedings of the 4th international conference on Computing frontiers
A compiler cost model for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO)
Taming the memory hogs: using compiler-inserted releases to manage physical memory intelligently
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Memory bank aware dynamic loop scheduling
Proceedings of the conference on Design, automation and test in Europe
An annotation language for optimizing software libraries
DSL'99 Proceedings of the 2nd conference on Conference on Domain-Specific Languages - Volume 2
Typed common intermediate format
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
Optimization of data prefetch helper threads with path-expression based statistical modeling
Proceedings of the 21st annual international conference on Supercomputing
An ilp based approach to reducing energy consumption in nocbased CMPS
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
SPRINT: a tool to generate concurrent transaction-level models from sequential code
EURASIP Journal on Applied Signal Processing
Hierarchical coarse-grained stream compilation for software defined radio
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-Directed Energy Optimization for Parallel Disk Based Systems
IEEE Transactions on Parallel and Distributed Systems
Energy-optimizing source code transformations for operating system-driven embedded software
ACM Transactions on Embedded Computing Systems (TECS)
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Software-cooperative power-efficient heterogeneous multi-core for media processing
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Runtime characterisation of irregular accesses applied to parallelisation of irregular reductions
International Journal of Computational Science and Engineering
An analytical model of locality-based parallel irregular reductions
Parallel Computing
Improving I/O performance of applications through compiler-directed code restructuring
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Language Extensions in Support of Compiler Parallelization
Languages and Compilers for Parallel Computing
Languages and Compilers for Parallel Computing
Automatic Discovery of Coarse-Grained Parallelism in Media Applications
Transactions on High-Performance Embedded Architectures and Compilers I
A Prefetching Algorithm for Multi-speed Disks
Transactions on High-Performance Embedded Architectures and Compilers I
SPM management using Markov chain based data access prediction
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Resource aware mapping on coarse grained reconfigurable arrays
Microprocessors & Microsystems
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
The Journal of Supercomputing
A translation system for enabling data mining applications on GPUs
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Evaluating compiler technology for control-flow optimizations for multimedia extension architectures
Microprocessors & Microsystems
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 2009 conference on Information Science, Technology and Applications
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive execution techniques of parallel programs for multiprocessors
Journal of Parallel and Distributed Computing
Compiler directed network-on-chip reliability enhancement for chip multiprocessors
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Can transactions enhance parallel programs?
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Quantifying uncertainty in points-to relations
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
OpenMP and compilation issue in embedded applications
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Advanced symbolic analysis for compilers: new techniques and algorithms for symbolic program analysis and optimization
The structure of a compiler for explicit and implicit parallelism
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Coarse grain task parallel processing with cache optimization on shared memory multiprocessor
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Toward to utilize the heterogeneous multiple processors of the chip multiprocessor architecture
EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Performance evaluation of compiler controlled power saving scheme
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
A modular and extensible macroprogramming compiler
Proceedings of the 2010 ICSE Workshop on Software Engineering for Sensor Network Applications
A profile-based tool for finding pipeline parallelism in sequential programs
Parallel Computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Automatic Parallelization in a Binary Rewriter
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Probabilistic delay budget assignment for synthesis of soft real-time applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SSIP'05 Proceedings of the 5th WSEAS international conference on Signal, speech and image processing
Kremlin: rethinking and rebooting gprof for the multicore age
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
ALTER: exploiting breakable dependences for parallelization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the international conference on Supercomputing
HAWKEYE: effective discovery of dataflow impediments to parallelization
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
A language for the compact representation of multiple program versions
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Loop selection for thread-level speculation
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Compiler control power saving scheme for multi core processors
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Parallelizing user-defined and implicit reductions globally on multiprocessors
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Automatic parallelization using the value evolution graph
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Embedded Systems Design
Hierarchical parallelism control for multigrain parallel processing
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Compiler and runtime support for shared memory parallelization of data mining algorithms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Automatic scoping of variables in parallel regions of an OpenMP program
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
An evaluation of auto-scoping in OpenMP
WOMPAT'04 Proceedings of the 5th international conference on OpenMP Applications and Tools: shared Memory Parallel Programming with OpenMP
OSCAR API for real-time low-power multicores and its performance on multicores and SMP servers
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms
Journal of Signal Processing Systems
HydraVM: extracting parallelism from legacy sequential code using STM
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Automatic extraction of multi-objective aware pipeline parallelism using genetic algorithms
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A framework for end-to-end verification and evaluation of register allocators
SAS'07 Proceedings of the 14th international conference on Static Analysis
From serial loops to parallel execution on distributed systems
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Designer-in-the-loop recoding of ESL models using static parallel access conflict analysis
Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
Parallelizing Sequential Programs with Statistical Accuracy Tests
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Probabilistic Embedded Computing
CUBIT: compact bitmap profiling for dynamic data dependence analysis
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Automatic extraction of pipeline parallelism for embedded heterogeneous multi-core platforms
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Palirria: Accurate On-line Parallelism Estimation for Adaptive Work-Stealing
Proceedings of Programming Models and Applications on Multicores and Manycores
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 4.11 |
Multiple processors can work together to speed up single applications, but sequential programs must be rewritten to take advantage of the extra processors. One way to do this is through automatic parallelization with a compiler. Multiprocessors pose especially challenging problems for parallelizing compilers. Sufficient work must be performed in parallel to overcome processor synchronization and communication overhead. Moreover, multiprocessor memory hierarchies are complex, containing both shared memory and multiple levels of cache memory. Thus, two techniques are essential in obtaining good multiprocessor performance for array-based numerical programs: locating coarse-grain parallelism and managing multiprocessor memory use. The authors describe new technology in the Stanford SUIF compiler that enables it to successfully carry out these techniques. First, a suite of robust analysis techniques operate across procedure boundaries to locate coarse-grain parallelism so that large computations can execute independently in parallel. Then, to help eliminate cache misses, affine partitioning is used to improve processor reuse of data, and data permutation and data strip-mining make contiguous the data accessed by each processor in the shared address space. When employed in the automatic parallelizing compiler, these techniques significantly affect the performance of half the NAS and SPECfp95 benchmark suites.