Speculative DMA for architecturally visible storage in instruction set extensions
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Way Stealing: cache-assisted automatic instruction set extensions
Proceedings of the 46th Annual Design Automation Conference
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
Proceedings of the 2009 International Conference on Computer-Aided Design
Design-space exploration of resource-sharing solutions for custom instruction set extensions
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
A memory interface for multi-purpose multi-stream accelerators
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Architecture support for custom instructions with memory operations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.03 |
Instruction set extensions (ISEs) can be used effectively to accelerate the performance of embedded processors. The critical and difficult task of ISE selection is often performed manually by designers. A few automatic methods for ISE generation have shown good capabilities but are still limited in the handling of memory accesses, and so they fail to directly address the memory wall problem. We present here the first ISE identification technique that can automatically identify state-holding application-specific functional units (AFUs) comprehensively, thus being able to eliminate a large portion of memory traffic from cache and the main memory. Our cycle-accurate results obtained by the SimpleScalar simulator show that the identified AFUs with architecturally visible storage gain significantly more than previous techniques and achieve an average speedup of 2.8times over pure software execution with a little area overhead. Moreover, the number of required memory-access instructions is reduced by two thirds on average, suggesting corresponding benefits on energy consumption