Cached-code compression for energy minimization in embedded processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Introducing the IA-64 Architecture
IEEE Micro
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
DISE: a programmable macro engine for customizing applications
Proceedings of the 30th annual international symposium on Computer architecture
Efficient execution of compressed programs
Efficient execution of compressed programs
Exploiting Value Locality in Physical Register Files
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Software-Controlled Operand-Gating
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Frequent value encoding for low power data buses
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Evaluation of extended dictionary-based static code compression schemes
Proceedings of the 2nd conference on Computing frontiers
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Adaptive and flexible dictionary code compression for embedded applications
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
FPGA-friendly code compression for horizontal microcoded custom IPs
Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
Memory-Link Compression Schemes: A Value Locality Perspective
IEEE Transactions on Computers
FlexCore: Utilizing Exposed Datapath Control for Efficient Computing
Journal of Signal Processing Systems
Complementing missing and inaccurate profiling using a minimum cost circulation algorithm
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Hi-index | 0.01 |
Wide instruction formats make it possible to control microarchitecture resources more precisely by the compiler by either enabling more parallelism (VLIW) or by saving power. Unfortunately, wide instructions impose a high pressure on the memory system due to an increased instruction-fetch bandwidth and a larger code working set/footprint. This paper presents a code compression scheme that allows the compiler to select what subset of a wide instruction set to use in each program phase at the granularity of basic blocks based on a profiling methodology. The decompression engine comprises a set of tables that convert a narrow instruction into a wide instruction in a dynamic fashion. The paper also presents a method for how to configure and dimension the decompression engine and how to generate a compressed program with embedded instructions that dynamically manage the tables in the decompression engine. We find that the 77 control bits in the original FlexCore instruction format can be reduced to 32 bits offering a compression of 58% and a modest performance overhead of less than 1% for management of the decompression tables.