Highly concurrent scalar processing
Highly concurrent scalar processing
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Selected papers of the second workshop on Languages and compilers for parallel computing
Parallelization of loops with exits on pipelined architectures
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Circular scheduling: a new technique to perform software pipelining
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
GURPR*: a new global software pipelining algorithm
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Register allocation for software pipelined loops
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Code generation schema for modulo scheduled loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Sentinel scheduling: a model for compiler-controlled speculative execution
ACM Transactions on Computer Systems (TOCS)
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Cydra 5 minisupercomputer: architecture and implementation
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Loop optimization for horizontal microcoded machines
ICS '90 Proceedings of the 4th international conference on Supercomputing
A compilation technique for software pipelining of loops with conditional jumps
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
A comparison of list schedules for parallel processing systems
Communications of the ACM
An efficient search algorithm to find the elementary circuits of a graph
Communications of the ACM
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Data Flow and Dependence Analysis for Instruction Level Parallelism
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Acceleration of First and Higher Order Recurrences on Processors with Instruction Level Parallelism
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Scheduling Loops on Parallel Processors: A Simple Algorithm with Close to Optimum Performance
CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
A Polynomial Time Method for Optimal Software Pipelining
CONPAR '92/ VAPP V Proceedings of the Second Joint International Conference on Vector and Parallel Processing: Parallel Processing
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
A technique of global optimization of microprograms
MICRO 11 Proceedings of the 11th annual workshop on Microprogramming
Scheduling and mapping: software pipelining in the presence of structural hazards
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
Optimum modulo schedules for minimum register requirements
ICS '95 Proceedings of the 9th international conference on Supercomputing
Petri net versus modulo scheduling for software pipelining
Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling with multiple initiation intervals
Proceedings of the 28th annual international symposium on Microarchitecture
Region-based compilation: an introduction and motivation
Proceedings of the 28th annual international symposium on Microarchitecture
Register allocation for predicated code
Proceedings of the 28th annual international symposium on Microarchitecture
Unrolling-based optimizations for modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
Hypernode reduction modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
Software pipelining showdown: optimal vs. heuristic methods in a production compiler
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A reduced multipipeline machine description that preserves scheduling constraints
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Speculative hedge: regulating compile-time speculation against profile variations
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Heuristics for register-constrained software pipelining
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Software pipelining loops with conditional branches
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Meld scheduling: relaxing scheduling constraints across region boundaries
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Optimization of machine descriptions for efficient use
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient formulation for optimal modulo schedulers
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Increasing memory bandwidth with wide buses: compiler, hardware and performance trade-offs
ICS '97 Proceedings of the 11th international conference on Supercomputing
Cache sensitive modulo scheduling
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A general algorithm for tiling the register level
ICS '98 Proceedings of the 12th international conference on Supercomputing
Resource widening versus replication: limits and performance-cost trade-off
ICS '98 Proceedings of the 12th international conference on Supercomputing
Optimal Modulo Scheduling Through Enumeration
International Journal of Parallel Programming
Modulo Scheduling with Reduced Register Pressure
IEEE Transactions on Computers
Reducing Data Hazards on Multi-pipelined DSP Architecture with Loop Scheduling
Journal of VLSI Signal Processing Systems - Special issue on future directions in the design and implementations of DSP systems
Quantitative Evaluation of Register Pressure on Software Pipelined Loops
International Journal of Parallel Programming
Split-path enhanced pipeline scheduling for loops with control flows
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Effective cluster assignment for modulo scheduling
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Widening resources: a cost-effective technique for aggressive ILP architectures
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
MPS: Miss-Path Scheduling for Multiple-Issue Processors
IEEE Transactions on Computers
Modulo scheduling for the TMS320C6x VLIW DSP architecture
Proceedings of the ACM SIGPLAN 1999 workshop on Languages, compilers, and tools for embedded systems
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time
IEEE Transactions on Computers
A recursive time estimation algorithm for program traces under resource constraints
SAC '98 Proceedings of the 1998 ACM symposium on Applied Computing
Optimized unrolling of nested loops
Proceedings of the 14th international conference on Supercomputing
Improved spill code generation for software pipelined loops
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
IEEE Transactions on Computers
Two-level hierarchical register file organization for VLIW processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Lifetime-Sensitive Modulo Scheduling in a Production Environment
IEEE Transactions on Computers
Register pressure responsive software pipelining
Proceedings of the 2001 ACM symposium on Applied computing
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching
Journal of VLSI Signal Processing Systems
Power-aware modulo scheduling for high-performance VLIW processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Scheduling time-constrained instructions on pipelined processors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop Transformations for Architectures with Partitioned Register Banks
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
ShiftQ: a bufferred interconnect for custom loop accelerators
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
IEEE Transactions on Computers
Evaluating the Use of Register Queues in Software Pipelined Loops
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Loop fusion for clustered VLIW architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
A comparative study of modulo scheduling techniques
ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Enhancing loop buffering of media and telecommunications applications using low-overhead predication
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Optimized Unrolling of Nested Loops
International Journal of Parallel Programming
Cycle-time aware architecture synthesis of custom hardware accelerators
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Code coverage and input variability: effects on architecture and compiler research
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
On achieving balanced power consumption in software pipelined loops
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Enhanced Co-Scheduling: A Software Pipelining Method Using Modulo-Scheduled Pipeline Theory
International Journal of Parallel Programming
Handling Global Constraints in Compiler Strategy
International Journal of Parallel Programming
Computation in the Context of Transport Triggered Architectures
International Journal of Parallel Programming
Meld Scheduling: A Technique for Relaxing Scheduling Constraints
International Journal of Parallel Programming
Optimization of Machine Descriptions for Efficient Use
International Journal of Parallel Programming
Control Flow Regeneration for Software Pipelined Loops with Conditions
International Journal of Parallel Programming
The Intel IA-64 Compiler Code Generator
IEEE Micro
A finite state machine based format model of software pipelined loops with conditions
Progress in computer research
Loop Shifting for Loop Compaction
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Optimizing Loop Performance for Clustered VLIW Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Pseudo-vectorizing Compiler for the SR8000 (Research Note)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Influence of Variable Time Operations in Static Instruction Scheduling
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Global Software Pipelining with Iteration Preselection
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Software Pipelining of Nested Loops
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Reduced code size modulo scheduling in the absence of hardware support
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Predicate-aware scheduling: a technique for reducing resource constraints
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Split-Path Enhanced Pipeline Scheduling
IEEE Transactions on Parallel and Distributed Systems
Adapting instruction level parallelism for optimizing leakage in VLIW architectures
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Meta optimization: improving compiler heuristics with machine learning
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Region-based hierarchical operation partitioning for multicluster processors
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Co-Scheduling Hardware and Software Pipelines
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Software Pipelining for Coarse-Grained Reconfigurable Instruction Set Processors
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
RTGEN: An Algorithm for Automatic Generation of Reservation Tables from Architectural Descriptions
Proceedings of the 12th international symposium on System synthesis
Register-Sensitive Software Pipelining
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
An Enhanced Co-Scheduling Method using Reduced MS-State Diagrams
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Code size reduction technique and implementation for software-pipelined DSP applications
ACM Transactions on Embedded Computing Systems (TECS)
RTGEN: an algorithm for automatic generation of reservation tables from architectural descriptions
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Custom Wide Counterflow Pipelines for High-Performance Embedded Applications
IEEE Transactions on Computers
An experimental evaluation of scalar replacement on scientific benchmarks
Software—Practice & Experience
Register Constrained Modulo Scheduling
IEEE Transactions on Parallel and Distributed Systems
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Probabilistic Predicate-Aware Modulo Scheduling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A loop accelerator for low power embedded VLIW processors
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Dynamic loop pipelining in data-driven architectures
Proceedings of the 2nd conference on Computing frontiers
Demystifying on-the-fly spill code
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatically partitioning packet processing applications for pipelined architectures
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic multithreading and multiprocessing of C programs for IXP
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
Cost Sensitive Modulo Scheduling in a Loop Accelerator Synthesis System
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Combining module selection and resource sharing for efficient FPGA pipeline synthesis
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
Register aware scheduling for distributed cache clustered architecture
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Specific optimization features in a C compiler for DSPs
Programming and Computing Software
Compiler transformations for effectively exploiting a zero overhead loop buffer
Software—Practice & Experience
Generic software pipelining at the assembly level
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Streamroller:: automatic synthesis of prescribed throughput accelerator pipelines
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Increasing hardware efficiency with multifunction loop accelerators
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Merging Head and Tail Duplication for Convergent Hyperblock Formation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Hybrid multi-core architecture for boosting single-threaded performance
ACM SIGARCH Computer Architecture News
A unified evaluation framework for coarse grained reconfigurable array architectures
Proceedings of the 4th international conference on Computing frontiers
Partitioning and scheduling DSP applications with maximal memory access hiding
EURASIP Journal on Applied Signal Processing
Efficient implementation of nested-loop multimedia algorithms
EURASIP Journal on Applied Signal Processing
Non-transparent debugging for software-pipelined loops
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Application driven embedded system design: a face recognition case study
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Hierarchical coarse-grained stream compilation for software defined radio
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Facilitating compiler optimizations through the dynamic mapping of alternate register structures
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Code-size conscious pipelining of imperfectly nested loops
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Latency-tolerant software pipelining in a production compiler
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Modulo scheduling for highly customized datapaths to increase hardware reusability
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Optimal vs. heuristic integrated code generation for clustered VLIW architectures
SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Optimizing code through iterative specialization
Proceedings of the 2008 ACM symposium on Applied computing
Proceedings of the 2008 ACM symposium on Applied computing
Optimized mapping for enchancing the operation parallelism in coarse-grained reconfigurable arrays
SMO'06 Proceedings of the 6th WSEAS International Conference on Simulation, Modelling and Optimization
Rotating register allocation with multiple rotating branches
Proceedings of the 22nd annual international conference on Supercomputing
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Register allocation for software pipelined multidimensional loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
VEAL: Virtualized Execution Accelerator for Loops
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
DVFS in loop accelerators using BLADES
Proceedings of the 45th annual Design Automation Conference
Programming Reconfigurable Decoupled Application Control Accelerator for Mobile Systems
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Timing optimization via nest-loop pipelining considering code size
Microprocessors & Microsystems
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture
Languages and Compilers for Parallel Computing
Coordinated concurrent memory accesses on a reconfigurable multimedia accelerator
Microprocessors & Microsystems
Integrated Modulo Scheduling for Clustered VLIW Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Software Pipelining in Nested Loops with Prolog-Epilog Merging
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
SPR: an architecture-adaptive CGRA mapping tool
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Resource aware mapping on coarse grained reconfigurable arrays
Microprocessors & Microsystems
Design and implementation of a queue compiler
Microprocessors & Microsystems
Proceedings of the 6th ACM conference on Computing frontiers
Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays
The Journal of Supercomputing
Modulo scheduling without overlapped lifetimes
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Synergistic execution of stream programs on multicores with accelerators
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
CGRA express: accelerating execution using dynamic operation fusion
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Scheduling with soft constraints
Proceedings of the 2009 International Conference on Computer-Aided Design
Automatic memory partitioning and scheduling for throughput and power optimization
Proceedings of the 2009 International Conference on Computer-Aided Design
Input-driven dynamic execution prediction of streaming applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Compiling for reconfigurable computing: A survey
ACM Computing Surveys (CSUR)
CC'07 Proceedings of the 16th international conference on Compiler construction
Genetic programming applied to compiler heuristic optimization
EuroGP'03 Proceedings of the 6th European conference on Genetic programming
The implementation of a coarse-grained reconfigurable architecture with loop self-pipelining
ARC'07 Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications
MIRS: modulo scheduling with integrated register spilling
LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Minimizing communication in rate-optimal software pipelining for stream programs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
An algorithm to improve parallelism in distributed systems using asynchronous calls
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
Towards a source level compiler: source level modulo scheduling
Program analysis and compilation, theory and practice
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
MEDICS: ultra-portable processing for medical image reconstruction
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Reuse-aware modulo scheduling for stream processors
Proceedings of the Conference on Design, Automation and Test in Europe
Resource recycling: putting idle resources to work on a composable accelerator
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Fine-grain dynamic instruction placement for L0 scratch-pad memory
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Compilers, architectures and synthesis for embedded computing: retrospect and prospect
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A design scheme for a reconfigurable accelerator implemented by single-flux quantum circuits
Journal of Systems Architecture: the EUROMICRO Journal
A graph drawing based spatial mapping algorithm for coarse-grained reconfigurable architectures
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Precedence constraint posting for cyclic scheduling problems
CPAIOR'11 Proceedings of the 8th international conference on Integration of AI and OR techniques in constraint programming for combinatorial optimization problems
The Journal of Supercomputing
An FPGA-based heterogeneous coarse-grained dynamically reconfigurable architecture
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
A constraint based approach to cyclic RCPSP
CP'11 Proceedings of the 17th international conference on Principles and practice of constraint programming
ACM Transactions on Embedded Computing Systems (TECS)
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Combined ILP and register tiling: analytical model and optimization framework
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Improving performance of nested loops on reconfigurable array processors
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
SCAN: a heuristic for near-optimal software pipelining
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Multi-dimensional kernel generation for loop nest software pipelining
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Dataflow-driven execution control in a coarse-grained reconfigurable array (abstract only)
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Integrated Code Generation for Loops
ACM Transactions on Embedded Computing Systems (TECS)
EPIMap: using epimorphism to map applications on CGRAs
Proceedings of the 49th Annual Design Automation Conference
Mathematical and Computer Modelling: An International Journal
Global cyclic cumulative constraint
CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Function inlining and loop unrolling for loop acceleration in reconfigurable processors
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Compositional approach applied to loop specialization
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
High performance FFT on SGI Altix 3700
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Execution Time Optimization Using Delayed Multidimensional Retiming
DS-RT '12 Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications
Memory partitioning and scheduling co-optimization in behavioral synthesis
Proceedings of the International Conference on Computer-Aided Design
The resource-constrained modulo scheduling problem: an experimental study
Computational Optimization and Applications
Libra: Tailoring SIMD Execution Using Heterogeneous Hardware and Dynamic Configurability
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
REGIMap: register-aware application mapping on coarse-grained reconfigurable architectures (CGRAs)
Proceedings of the 50th Annual Design Automation Conference
Throughput-memory footprint trade-off in synthesis of streaming software on embedded multiprocessors
ACM Transactions on Embedded Computing Systems (TECS)
The benefits of using variable-length pipelined operations in high-level synthesis
ACM Transactions on Embedded Computing Systems (TECS)
Just-In-Time Software Pipelining
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Fast modulo scheduler utilizing patternized routes for coarse-grained reconfigurable architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Evaluator-executor transformation for efficient pipelining of loops with conditionals
ACM Transactions on Architecture and Code Optimization (TACO)
SDC-based modulo scheduling for pipeline synthesis
Proceedings of the International Conference on Computer-Aided Design
CROSS cyclic resource-constrained scheduling solver
Artificial Intelligence
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture
Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems
Hi-index | 0.02 |
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.