Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPARC architecture manual: version 8
The SPARC architecture manual: version 8
MIPS RISC architectures
Register allocation with instruction scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Partitioned register file for TTAs
Proceedings of the 28th annual international symposium on Microarchitecture
Advanced compiler design and implementation
Advanced compiler design and implementation
Quantitative Evaluation of Register Pressure on Software Pipelined Loops
International Journal of Parallel Programming
Evolution and evaluation of SPEC benchmarks
ACM SIGMETRICS Performance Evaluation Review
Stack and Queue Layouts of Directed Acyclic Graphs: Part I
SIAM Journal on Computing
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Loop Transformations for Architectures with Partitioned Register Banks
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Evaluating the Use of Register Queues in Software Pipelined Loops
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
The Alpha 21264 Microprocessor
IEEE Micro
Thumb: Reducing the Cost of 32-bit RISC Performance in Portable and Consumer Applications
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Queue Machines: Hardware Compilation in Hardware
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor
IEEE Transactions on Computers
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core
The Journal of Supercomputing
The QC-2 parallel Queue processor architecture
Journal of Parallel and Distributed Computing
Design and architecture for an embedded 32-bit QueueCore
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Hi-index | 0.00 |
This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore's reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.