Dynamic coalescing for 16-bit instructions

Authors:
Arvind Krishnaswamy;Rajiv Gupta
Affiliations:
The University of Arizona, Tucson, AZ;The University of Arizona, Tucson, AZ
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2005

Citing 15
Cited 5

Executing compressed programs on an embedded RISC architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving code density using compression techniques

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Code compression for embedded systems

DAC '98 Proceedings of the 35th annual Design Automation Conference
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Profile-guided code compression

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile guided selection of ARM and thumb instructions

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
ARM System Architecture

ARM System Architecture
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
PicoJava: A Direct Execution Engine For Java Bytecode

Computer
Embedded Control Problems, Thumb, and the ARM7TDMI

IEEE Micro
Instruction Pre-Processing in Trace Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

Link-time binary rewriting techniques for program compaction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient Use of Invisible Registers in Thumb Code

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
High-quality ISA synthesis for low-power cache designs in embedded microprocessors

IBM Journal of Research and Development
Dynamic configuration of application-specific implicit instructions for embedded pipelined processors

Proceedings of the 2008 ACM symposium on Applied computing
Architecture Optimization of Application-Specific Implicit Instructions

ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the embedded domain, memory usage and energy consumption are critical constraints.Embedded processors such as the ARM and MIPS provide a 16-bit instruction set, (called Thumb in the case of the ARM family of processors), in addition to the 32-bit instruction set to address these concerns. Using 16-bit instructions one can achieve code size reduction and instruction cache energy savings at the cost of performance. This paper presents a novel approach that enhances the performance of 16-bit Thumb code. We have observed that throughout Thumb code there exist Thumb instruction pairs that are equivalent to a single ARM instruction. We have developed enhancements to the processor microarchitecture and the Thumb instruction set to exploit this property. We enhance the Thumb instruction set by incorporating Augmenting eXtensions (AX). A Thumb instruction pair that can be combined into a single ARM instruction is replaced by an AXThumb instruction pair by the compiler. The AX instruction is coalesced with the immediately following Thumb instruction to generate a single ARM instruction at decode time. The enhanced microarchitecture ensures that coalescing does not introduce pipeline delays or increase cycle time thereby resulting in reduction of both instruction counts and cycle counts. Using AX instructions and coalescing hardware we are also able to support efficient predicated execution in 16-bit mode.