Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
Direct parallelization of call statements
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Theory of linear and integer programming
Theory of linear and integer programming
Program partitioning and synchronization on multiprocessor systems
Program partitioning and synchronization on multiprocessor systems
The Organization of Computations for Uniform Recurrence Equations
Journal of the ACM (JACM)
The parallel execution of DO loops
Communications of the ACM
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Automatic discovery of linear restraints among variables of a program
POPL '78 Proceedings of the 5th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Dependence graphs and compiler optimizations
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimization and interconnection complexity for: parallel processors, single-stage networks, and decision trees
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
Compile-time scheduling and optimization for asynchronous machines (multiprocessor, compiler, parallel processing)
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Low overhead parallel schedules for task graphs
SPAA '90 Proceedings of the second annual ACM symposium on Parallel algorithms and architectures
Beyond loop partitioning: data assignment and overlap to reduce communication overhead
ICS '91 Proceedings of the 5th international conference on Supercomputing
Semantical interprocedural parallelization: an overview of the PIPS project
ICS '91 Proceedings of the 5th international conference on Supercomputing
Analysis and transformation in the ParaScope editor
ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop displacement: an approach for transforming and scheduling loops for parallel execution
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Time Optimal Linear Schedules for Algorithms with Uniform Dependencies
IEEE Transactions on Computers
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Automatic partitioning of a program dependence graph into parallel tasks
IBM Journal of Research and Development
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A general framework for iteration-reordering loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Optimizing for parallelism and data locality
ICS '92 Proceedings of the 6th international conference on Supercomputing
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler blockability of numerical algorithms
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Non-unimodular transformations of nested loops
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Access normalization: loop restructuring for NUMA computers
ACM Transactions on Computer Systems (TOCS)
Partitioning the global space for distributed memory systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Optimal tile size adjustment in compiling general DOACROSS loop nests
ICS '95 Proceedings of the 9th international conference on Supercomputing
A limit study of local memory requirements using value reuse profiles
Proceedings of the 28th annual international symposium on Microarchitecture
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler techniques for data partitioning of sequentially iterated parallel loops
ICS '90 Proceedings of the 4th international conference on Supercomputing
Optimal weighted loop fusion for parallel programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Designing a Scalable Processor Array for Recurrent Computations
IEEE Transactions on Parallel and Distributed Systems
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic selection of high-order transformations in the IBM XL FORTRAN compilers
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
A Unifying Lattice-Based Approach for the Partitioning of Systolic Arrays via LPGS and LSGP
Journal of VLSI Signal Processing Systems
A general algorithm for tiling the register level
ICS '98 Proceedings of the 12th international conference on Supercomputing
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Locality optimizations for multi-level caches
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Automated cache optimizations using CME driven diagnosis
Proceedings of the 14th international conference on Supercomputing
International Journal of Parallel Programming
IEEE Transactions on Parallel and Distributed Systems
Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers
The Journal of Supercomputing
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Optimizing locality for ODE solvers
ICS '01 Proceedings of the 15th international conference on Supercomputing
Reducing memory requirements of nested loops for embedded systems
Proceedings of the 38th annual Design Automation Conference
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
On tiling space-time mapped loop nests
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Loop parallelization algorithms
Compiler optimizations for scalable parallel systems
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors
Compiler optimizations for scalable parallel systems
Communication-free partitioning of nested loops
Compiler optimizations for scalable parallel systems
Using Cohort Scheduling to Enhance Server Performance (Extended Abstract)
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Data Relation Vectors: A New Abstraction for Data Optimizations
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Automatic code generation for executing tiled nested loops onto parallel architectures
Proceedings of the 2002 ACM symposium on Applied computing
Automatic data and computation decomposition on distributed memory parallel computers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimal tiling for the RNA base pairing problem
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles
IEEE Transactions on Parallel and Distributed Systems
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets
The Journal of Supercomputing
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
Expressing cross-loop dependencies through hyperplane data dependence analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Quantifying the Multi-Level Nature of Tiling Interactions
International Journal of Parallel Programming
Reuse-Driven Tiling for Improving Data Locality
International Journal of Parallel Programming
Data-Centric Transformations for Locality Enhancement
International Journal of Parallel Programming
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
Time-minimal tiling when rise is larger than zero
Parallel Computing
False Sharing and Spatial Locality in Multiprocessor Caches
IEEE Transactions on Computers
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors
IEEE Transactions on Computers
Compile-Time Partitioning of Iterative Parallel Loops to Reduce Cache Coherency Traffic
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Parallelizing Compilers on the Perfect Benchmarks Programs
IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A General Methodology of Partitioning and Mapping for Given Regular Arrays
IEEE Transactions on Parallel and Distributed Systems
On Supernode Transformation with Minimized Total Running Time
IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape
IEEE Transactions on Parallel and Distributed Systems
ESOP '02 Proceedings of the 11th European Symposium on Programming Languages and Systems
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
The Combined Effectiveness of Unimodular Transformations, Tiling, and Software Prefetching
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Optimized Execution of Fortran 90 Array Language on Symmetric Shared-Memory Multiprocessors
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Application of the Polytope Model to Functional Programs
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Exact Partitioning of Affine Dependence Algorithms
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Solving Bi-knapsack Problem Using Tiling Approach for Dynamic Programming
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Storage Mapping Optimization for Parallel Programs
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Tiling and Memory Reuse for Sequences of Nested Loops
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
I/O-Conscious Tiling for Disk-Resident Data Sets
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Using Cohort-Scheduling to Enhance Server Performance
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Analysis of Multithreaded Programs
SAS '01 Proceedings of the 8th International Symposium on Static Analysis
A Framework for Loop Distribution on Limited On-Chip Memory Processors
CC '00 Proceedings of the 9th International Conference on Compiler Construction
A Technique for FPGA Synthesis Driven by Automatic Source Code Analysis and Transformations
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Loop Transformations for Hierarchical Parallelism and Locality
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Array Unification: A Locality Optimization Technique
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Optimal task scheduling at run time to exploit intra-tile parallelism
Parallel Computing
Exact partitioning of affine dependence algorithms
Embedded processor design challenges
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
Three-dimensional orthogonal tile sizing problem: mathematical programming approach
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications
IEEE Transactions on Computers
Journal of Parallel and Distributed Computing
Automatic parallel code generation for tiled nested loops
Proceedings of the 2004 ACM symposium on Applied computing
Improving register allocation for subscripted variables
ACM SIGPLAN Notices - Best of PLDI 1979-1999
A data locality optimizing algorithm
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Data Space Oriented Scheduling in Embedded Systems
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Quasidynamic Layout Optimizations for Improving Data Locality
IEEE Transactions on Parallel and Distributed Systems
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
A novel approach for partitioning iteration spaces with variable densities
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Teleport messaging for distributed stream programs
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data space-oriented tiling for enhancing locality
ACM Transactions on Embedded Computing Systems (TECS)
Fast and efficient searches for effective optimization-phase sequences
ACM Transactions on Architecture and Code Optimization (TACO)
Sparse Tiling for Stationary Iterative Methods
International Journal of High Performance Computing Applications
A general approach for partitioning N-dimensional parallel nested loops with conditionals
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Locality and parallelism optimization for dynamic programming algorithm in bioinformatics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Improving locality for ODE solvers by program transformations
Scientific Programming
A parallel dynamic programming algorithm on a multi-core architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Effective automatic parallelization of stencil computations
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Parameterized tiled loops for free
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MPSoC memory optimization using program transformation
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A step towards unifying schedule and storage optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Buffer and Register Allocation for Memory Space Optimization
Journal of VLSI Signal Processing Systems
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Cronus: A platform for parallel code generation based on computational geometry methods
Journal of Systems and Software
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Smashing: Folding Space to Tile through Time
Languages and Compilers for Parallel Computing
Reducing memory requirements of resource-constrained applications
ACM Transactions on Embedded Computing Systems (TECS)
Parametric multi-level tiling of imperfectly nested loops
Proceedings of the 23rd international conference on Supercomputing
Exploring parallelization strategies for NUFFT data translation
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Compact multi-dimensional kernel extraction for register tiling
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Loop transformations for reducing data space requirements of resource-constrained applications
SAS'03 Proceedings of the 10th international conference on Static analysis
Automatic creation of tile size selection models
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Parameterized tiling revisited
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Using non-canonical array layouts in dense matrix operations
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Selecting the tile shape to reduce the total communication volume
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A parallel numerical solver using hierarchically tiled arrays
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Locality optimization of stencil applications using data dependency graphs
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Automatic generation of fpga-specific pipelined accelerators
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
A geometric approach for partitioning n-dimensional non-rectangular iteration spaces
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Forward communication only placements and their use for parallel program construction
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Combining performance aspects of irregular gauss-seidel via sparse tiling
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Efficient tiled loop generation: D-tiling
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic C-to-CUDA code generation for affine programs
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Predictive modeling in a polyhedral optimization space
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
On-chip cache hierarchy-aware tile scheduling for multicore machines
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
Distributed Shared Memory and Compiler-Induced Scalable Locality for Scalable Cluster Performance
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Tiling stencil computations to maximize parallelism
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Code generation for parallel execution of a class of irregular loops on distributed memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Locality optimized shared-memory implementations of iterated runge-kutta methods
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
FPGA-specific synthesis of loop-nests with pipelined computational cores
Microprocessors & Microsystems
A script-based autotuning compiler system to generate high-performance CUDA code
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Improved loop tiling based on the removal of spurious false dependences
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Improving last level cache locality by integrating loop and data transformations
Proceedings of the International Conference on Computer-Aided Design
Parallel schedule synthesis for attribute grammars
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hybrid Hexagonal/Classical Tiling for GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
A Case Study of Implementing Supernode Transformations
International Journal of Parallel Programming
Hi-index | 0.01 |
Supercompilers must reschedule computations defined by nested DO-loops in order to make an efficient use of supercomputer features (vector units, multiple elementary processors, cache memory, etc…). Many rescheduling techniques like loop interchange, loop strip-mining or rectangular partitioning have been described to speedup program execution. We present here a class of partitionings that encompasses previous techniques and provides enough flexibility to adapt code to multiprocessors with two levels of parallelism and two levels of memory.