Journal of the ACM (JACM)
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
Systolic algorithms to examine all pairs of elements
Communications of the ACM
IEEE Transactions on Computers
Parallel scheduling of recursively defined arrays
Journal of Symbolic Computation
Compiling C for vectorization, parallelization, and inline expansion
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms
IEEE Transactions on Computers
The importance of direct dependences for automatic parallelization
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Automatic discovery of parallelism: a tool and an experiment (extended abstract)
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
The symbolic hyperplane transformation for recursively defined arrays
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
On high-speed computing with a programmable linear array
Proceedings of the 1988 ACM/IEEE conference on Supercomputing
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A Note on the Linear Transformation Method for Systolic Array Design
IEEE Transactions on Computers
Vectorization and parallelization of irregular problems via graph coloring
ICS '91 Proceedings of the 5th international conference on Supercomputing
Loop partitioning for distributed memory multiprocessors as unimodular transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
Semantical interprocedural parallelization: an overview of the PIPS project
ICS '91 Proceedings of the 5th international conference on Supercomputing
Extending the I test to direction vectors
ICS '91 Proceedings of the 5th international conference on Supercomputing
Uniform techniques for loop optimization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Analysis and transformation in the ParaScope editor
ICS '91 Proceedings of the 5th international conference on Supercomputing
A unified framework for systematic loop transformations
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies
IEEE Transactions on Computers
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
Analysis of free schedule in periodic graphs
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Delinearization: an efficient way to break multiloop dependence equations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
ICS '92 Proceedings of the 6th international conference on Supercomputing
On exact data dependence analysis
ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Non-unimodular transformations of nested loops
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
Program optimization and parallelization using idioms
ACM Transactions on Programming Languages and Systems (TOPLAS)
Reducing data communication overhead for DOACROSS loop nests
ICS '94 Proceedings of the 8th international conference on Supercomputing
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Compiler technology for parallel scientific computation
Scientific Programming
IEEE Transactions on Parallel and Distributed Systems
On Effective Execution of Nonuniform DOACROSS Loops
IEEE Transactions on Parallel and Distributed Systems
Achieving Full Parallelism Using Multidimensional Retiming
IEEE Transactions on Parallel and Distributed Systems
On the perfect accuracy of an approximate subscript analysis test
ICS '90 Proceedings of the 4th international conference on Supercomputing
Journal of VLSI Signal Processing Systems
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems
Constraint based vectorization
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Alpha du centaur: a prototype environment for the design of parallel regular alorithms
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Program Improvement by Source-to-Source Transformation
Journal of the ACM (JACM)
Vectorization and parallelization interactive assistant
CSC '88 Proceedings of the 1988 ACM sixteenth annual conference on Computer science
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays
Journal of VLSI Signal Processing Systems
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Polygon rendering on a stream architecture
HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Finding Quadratic Schedules for Affine Recurrence Equations Via Nonsmooth Optimization
Journal of VLSI Signal Processing Systems
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
A preprocessing step for global loop transformations for data transfer optimization
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
A Survey of Parallel Machine Organization and Programming
ACM Computing Surveys (CSUR)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Glypnir—a programming language for Illiac IV
Communications of the ACM
Loop parallelization algorithms
Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops
Compiler optimizations for scalable parallel systems
Scheduling reductions on realistic machines
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Enabling unimodular transformations
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays
IEEE Transactions on Parallel and Distributed Systems
Synthesizing Nested Loop Algorithms Using Nonlinear Transformation Method
IEEE Transactions on Parallel and Distributed Systems
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays
IEEE Transactions on Parallel and Distributed Systems
Dependence Uniformization: A Loop Parallelization Technique
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
On Loop Transformations for Generalized Cycle Shrinking
IEEE Transactions on Parallel and Distributed Systems
Constructive Methods for Scheduling Uniform Loop Nests
IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays
IEEE Transactions on Parallel and Distributed Systems
High Level Software Synthesis of Affine Iterative Algorithms onto Parallel Architectures
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
A BSP Approach to the Scheduling of Tightly-Nested Loops
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Structured Scheduling of Recurrence Equations: Theory and Practice
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Complexity of Multi-dimensional Loop Alignment
STACS '02 Proceedings of the 19th Annual Symposium on Theoretical Aspects of Computer Science
On the Optimality of Feautrier's Scheduling Algorithm
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Efficient Execution of Doacross Loops on Distributed Memory Systems
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Loop Transformations for Hierarchical Parallelism and Locality
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Structured scheduling of recurrence equations: theory and practice
Embedded processor design challenges
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
An introduction to processor-time-optimal systolic arrays
Highly parallel computaions
Scheduling in Co-Partitioned Array Architectures
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Efficient mapping of algorithms to single-stage interconnections
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
Using an oracle to measure potential parallelism in single instruction stream programs
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Data broadcasting in linearly scheduled array processors
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
The ILLIAC IV FORTRAN compiler
Proceedings of the conference on Programming languages and compilers for parallel and vector machines
The Paralyzer: Ivtran's Parallelism Analyzer and Synthesizer
Proceedings of the conference on Programming languages and compilers for parallel and vector machines
On programming parallel computers
Proceedings of the conference on Programming languages and compilers for parallel and vector machines
A CAD system for unified hardware-software design
DAC '75 Proceedings of the 12th Design Automation Conference
Multiprocessor software design
ACM '80 Proceedings of the ACM 1980 annual conference
Program improvement by source to source transformation
POPL '76 Proceedings of the 3rd ACM SIGACT-SIGPLAN symposium on Principles on programming languages
On the parallelization of loop nests containing while loops
PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Mapping deep nested do-loop DSP algorithms to large scale FPGA array structures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Users' experience with the ILLIAC IV system and its programming languages
ACM SIGPLAN Notices
Configware and morphware going mainstream
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable systems
An experimental evaluation of scalar replacement on scientific benchmarks
Software—Practice & Experience
The digital divide of computing
Proceedings of the 1st conference on Computing frontiers
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Improving register allocation for subscripted variables
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Mapping rectangular mesh algorithms onto asymptotically space-optimal arrays
Journal of Parallel and Distributed Computing
Optimizing the memory bandwidth with loop fusion
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Journal of Systems and Software - Special issue: Software engineering education and training
Scalarization using loop alignment and loop skewing
The Journal of Supercomputing
International Journal of High Performance Computing Applications
Iterational retiming: maximize iteration-level parallelism for nested loops
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Reuse analysis of indirectly indexed arrays
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Examining DCSP coordination tradeoffs
AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
The Journal of Supercomputing
Proceedings of the 20th annual international conference on Supercomputing
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Journal of Systems and Software
The rise and fall of High Performance Fortran: an historical object lesson
Proceedings of the third ACM SIGPLAN conference on History of programming languages
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping
Journal of VLSI Signal Processing Systems
An Efficient Code Generation Algorithm for Non-orthogonal DSP Architecture
Journal of VLSI Signal Processing Systems
Implementing fine grain processor arrays on field-programmable logic
Integrated Computer-Aided Engineering
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient implementation of nested-loop multimedia algorithms
EURASIP Journal on Applied Signal Processing
A Restructurable Computer System
IEEE Transactions on Computers
On the Analysis and Synthesis of VLSI Algorithms
IEEE Transactions on Computers
Measuring the Parallelism Available for Very Long Instruction Word Architectures
IEEE Transactions on Computers
Register allocation for software pipelined multidimensional loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cronus: A platform for parallel code generation based on computational geometry methods
Journal of Systems and Software
Journal of Parallel and Distributed Computing
Optimizing parallelism for nested loops with iterational and instructional retiming
Journal of Embedded Computing - Selected papers of EUC 2005
Precise Management of Scratchpad Memories for Localising Array Accesses in Scientific Codes
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
WARPP: a toolkit for simulating high-performance parallel scientific codes
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automatic Parallelization and Optimization of Programs by Proof Rewriting
SAS '09 Proceedings of the 16th International Symposium on Static Analysis
On control signals for multi-dimensional time
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Pipelined parallelization in HPF programs on the earth simulator
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Speculative parallelization using state separation and multiple value prediction
Proceedings of the 2010 international symposium on Memory management
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Structured parallel programming with deterministic patterns
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Automatic verification of determinism for structured parallel programs
SAS'10 Proceedings of the 17th international conference on Static analysis
Geometric scheduling of 2-D UET-UCT uniform dependence loops
EUROMICRO-PDP'02 Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing
Automatic code generation for distributed memory architectures in the polytope model
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Performance analysis of a hybrid MPI/CUDA implementation of the NASLU benchmark
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Loop striping: maximize parallelism for nested loops
EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
Optimizing nested loops with iterational and instructional retiming
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Proceedings of the International Conference on Computer-Aided Design
Automatic detection of saturation and clipping idioms
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Synthesising graphics card programs from DSLs
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
A direct method for optimal VLSI realization of deeply nested n-D loop problems
Microprocessors & Microsystems
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 48.28 |
Methods are developed for the parallel execution of different iterations of a DO loop. Both asynchronous multiprocessor computers and array computers are considered. Practical application to the design of compilers for such computers is discussed.