Stack and Queue Layouts of Directed Acyclic Graphs: Part I
SIAM Journal on Computing
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Profile guided selection of ARM and thumb instructions
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Enhancing the performance of 16-bit code using augmenting instructions
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Thumb: Reducing the Cost of 32-bit RISC Performance in Portable and Consumer Applications
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
An Efficient Compiler Technique for Code Size Reduction Using Reduced Bit-Width ISAs
Proceedings of the conference on Design, automation and test in Europe
Parallel Queue Processor Architecture Based on Produced Order Computation Model
The Journal of Supercomputing
High-Level Modeling and FPGA Prototyping of Produced Order Parallel Queue Processor Core
The Journal of Supercomputing
Microarchitecture and compiler techniques for dual width isa processors
Microarchitecture and compiler techniques for dual width isa processors
Design and architecture for an embedded 32-bit QueueCore
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Hi-index | 0.00 |
Embedded systems very often demand small memory footprint code. A popular architectural modification to improve code density in RISC embedded processors is to use a dual instruction set. This approach reduces code size at the cost of performance degradation due to the greater number of reduced width instructions required to execute the same task. We propose a novel alternative for reducing code size by using a single reduced instruction set queue machine. We present a efficient code generation algorithm to insert additional instructions to be able to execute programs in the reduced instruction set. Our experiments show that the insertion of additional instructions is minimal and we demonstrate improved code size reduction of 16% over MIPS16, 26% over Thumb, and 50% over MIPS32 code. Furthermore, we show that our compiler without any optimization is able to extract about the same parallelism than fully optimized RISC code.