Natural instruction level parallelism-aware compiler for high-performance QueueCore processor architecture

Authors:
Ben Abdallah Abderazek;Masashi Masuda;Arquimedes Canedo;Kenichi Kuroda
Affiliations:
School of Computer Science and Engineering, Adaptive Systems Laboratory, The University of Aizu, Aizu-Wakamatsu-shi, Japan 965-8580;School of Computer Science and Engineering, Adaptive Systems Laboratory, The University of Aizu, Aizu-Wakamatsu-shi, Japan 965-8580;School of Computer Science and Engineering, Adaptive Systems Laboratory, The University of Aizu, Aizu-Wakamatsu-shi, Japan 965-8580 and IBM Tokyo Research Laboratory, Yamato-shi, Japan 242-8502;School of Computer Science and Engineering, Adaptive Systems Laboratory, The University of Aizu, Aizu-Wakamatsu-shi, Japan 965-8580
Venue:
The Journal of Supercomputing
Year:
2011

Citing 26
Cited 0

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPARC architecture manual: version 8

The SPARC architecture manual: version 8
MIPS RISC architectures

MIPS RISC architectures
Register allocation with instruction scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Partitioned register file for TTAs

Proceedings of the 28th annual international symposium on Microarchitecture
Advanced compiler design and implementation

Advanced compiler design and implementation
Quantitative Evaluation of Register Pressure on Software Pipelined Loops

International Journal of Parallel Programming
Evolution and evaluation of SPEC benchmarks

ACM SIGMETRICS Performance Evaluation Review
Stack and Queue Layouts of Directed Acyclic Graphs: Part I

SIAM Journal on Computing
Data flow on a queue machine

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Loop Transformations for Architectures with Partitioned Register Banks

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Evaluating the Use of Register Queues in Software Pipelined Loops

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
The Alpha 21264 Microprocessor

IEEE Micro
Thumb: Reducing the Cost of 32-bit RISC Performance in Portable and Consumer Applications

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Queue Machines: Hardware Compilation in Hardware

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor

IEEE Transactions on Computers
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core

The Journal of Supercomputing
The QC-2 parallel Queue processor architecture

Journal of Parallel and Distributed Computing
Design and architecture for an embedded 32-bit QueueCore

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work presents a static method implemented in a compiler for extracting high instruction level parallelism for the 32-bit QueueCore, a queue computation-based processor. The instructions of a queue processor implicitly read and write their operands, making instructions short and the programs free of false dependencies. This characteristic allows the exploitation of maximum parallelism and improves code density. Compiling for the QueueCore requires a new approach since the concept of registers disappears. We propose a new efficient code generation algorithm for the QueueCore. For a set of numerical benchmark programs, our compiler extracts more parallelism than the optimizing compiler for an RISC machine by a factor of 1.38. Through the use of QueueCore's reduced instruction set, we are able to generate 20% and 26% denser code than two embedded RISC processors.