Stack computers: the new wave
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Evaluation of the WM architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Laying out graphs using queues
SIAM Journal on Computing
Partitioned register file for TTAs
Proceedings of the 28th annual international symposium on Microarchitecture
Processor design for portable systems
Journal of VLSI Signal Processing Systems - Special issue on technologies for wireless computing
Quantitative Evaluation of Register Pressure on Software Pipelined Loops
International Journal of Parallel Programming
Evolution and evaluation of SPEC benchmarks
ACM SIGMETRICS Performance Evaluation Review
Stack and Queue Layouts of Directed Acyclic Graphs: Part I
SIAM Journal on Computing
High-speed top-of-stack scheme for VLSI processor: a management algorithm and its analysis
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Design and implementation of generics for the .NET Common language runtime
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
P-code and compiler portability: experience with a Modula-2 optimizing compiler
ACM SIGPLAN Notices
Java Virtual Machine Specification
Java Virtual Machine Specification
SH3: High Code Density, Low Power
IEEE Micro
A preliminary architecture for a basic data-flow processor
ISCA '75 Proceedings of the 2nd annual symposium on Computer architecture
Thumb: Reducing the Cost of 32-bit RISC Performance in Portable and Consumer Applications
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Queue Machines: Hardware Compilation in Hardware
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Register File Design Considerations in Dynamically Scheduled Processors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Register Queues: A New Hardware/Software Approach to Efficient Software Pipelining
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Code Size Efficiency in Global Scheduling for ILP Processors
INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures
Power-aware compilation for register file energy reduction
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Investigating Available Instruction Level Parallelism for Stack Based Machine Architectures
DSD '04 Proceedings of the Digital System Design, EUROMICRO Systems
Parallel Queue Processor Architecture Based on Produced Order Computation Model
The Journal of Supercomputing
History of programming languages---II
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor
IEEE Transactions on Computers
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
Compilation framework for code size reduction using reduced bit-width ISAs (rISAs)
ACM Transactions on Design Automation of Electronic Systems (TODAES)
High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core
The Journal of Supercomputing
Exploring a Stack Architecture
Computer
A new code generation algorithm for 2-offset producer order queue computation model
Computer Languages, Systems and Structures
Design and architecture for an embedded 32-bit QueueCore
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
SIAM Journal on Discrete Mathematics
Hi-index | 0.00 |
Queue computers use a FIFO data structure for data processing. The essential characteristics of a queue-based architecture excel at satisfying the demands of embedded systems, including compact instruction set, simple hardware logic, high parallelism, and low power consumption. The size of the queue is an important concern in the design of a realizable embedded queue processor. We introduce the relationship between parallelism, length of data dependency edges in data flow graphs and the queue utilization requirements. This paper presents a technique developed to make the compiler aware of the size of the queue register file and, thus, optimize the programs to effectively utilize the available hardware. The compiler examines the data flow graph of the programs and partitions it into clusters whenever it exceeds the queue limits of the target architecture. The presented algorithm deals with the two factors that affect the utilization of the queue, namely parallelism and the length of variables' reaching definitions. We analyze how the quality of the generated code is affected for SPEC CINT95 benchmark programs and different queue size configurations. Our results show that for reasonable queue sizes the compiler generates a code that is comparable to the code generated for infinite resources in terms of instruction count, static execution time, and instruction level parallelism.