Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Proceedings of the 24th annual international symposium on Computer architecture
Communications of the ACM
Code compression based on operand factorization
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A code decompression architecture for VLIW processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Code compression for VLIW processors using variable-to-fixed coding
Proceedings of the 15th international symposium on System Synthesis
Instruction Pre-Processing in Trace Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Compiler optimization and ordering effects on VLIW code compression
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Code Compression Based on Operand-Factorization for VLIW Processors
DCC '04 Proceedings of the Conference on Data Compression
Reflections on the memory wall
Proceedings of the 1st conference on Computing frontiers
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
2D-VLIW: An Architecture Based on the Geometry of Computation
ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Exploiting dynamic reconfiguration techniques: the 2D-VLIW approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Trimaran: an infrastructure for research in instruction-level parallelism
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Hi-index | 0.00 |
In this paper we propose a new technique to reduce the program footprint and the instruction fetch latency in high performance architectures adopting long instructions in the memory. Our technique is based on an algorithm that factors long instructions into instruction patterns and encoded instructions, which contains no redundant data and it is stored into an I-cache. The instruction patterns look like a map to the decode logic to prepare the instruction to be executed in the execution stages. These patterns are stored into a new cache (P-cache). We evaluated this technique in a high performance architecture called 2D-VLIW through trace-driven experiments with MediaBench and SPEC programs. We compared the 2D-VLIW execution time performance before and after the encoding, and also with other encoding techniques implemented in computer architectures. Experimental results reveal that our encoding strategy provides a program execution time that is up to 69% better than EPIC.