Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Real-time systems and their programming languages
Real-time systems and their programming languages
Interface co-synthesis techniques for embedded systems
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
DAC '96 Proceedings of the 33rd annual Design Automation Conference
DAC '97 Proceedings of the 34th annual Design Automation Conference
The case for a configure-and-execute paradigm
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Fast prototyping: a system design flow applied to a complex system-on-chip multiprocessor design
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Description and simulation of hardware/software systems with Java
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Java driven codesign and prototyping of networked embedded systems
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment
Journal of the ACM (JACM)
Interface Design for Core-Based Systems
IEEE Design & Test
An Object-Oriented Communication Library for Hardware-Software CoDesign
CODES '97 Proceedings of the 5th International Workshop on Hardware/Software Co-Design
Bus-Based Communication Synthesis on System-Level
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Cluster miss prediction for instruction caches in embedded networking applications
Proceedings of the 14th ACM Great Lakes symposium on VLSI
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A compiler intermediate representation for reconfigurable fabrics
International Journal of Parallel Programming
Hi-index | 0.00 |
Reuse of cores can reduce design time for systems-on-a-chip. Such reuse is dependent on being able to easily interface a core to any bus. To enable such interfacing, many propose separating a core's interface from its internals by using a bus wrapper. However, this separation can lead to a performance penalty when reading a core's internal registers. In this paper, we introduce prefetching, which is analogous to caching, as a technique to reduce or eliminate this performance penalty, involving a tradeoff with power and size. We describe the prefetching technique, classify different types of registers, describe our initial prefetching architectures and heuristics for certain classes of registers, and highlight experiments demonstrating the performance improvements and size/power tradeoffs. We further introduce a technique for automatically designing a prefetch unit that satisfies user-imposed register-access constraints. The technique benefits from mapping the prefetching problem to the well-known real-time process scheduling problem. We then extend the technique to allow user-specified register interdependencies, using a Petri net model, resulting in even more efficient prefetch schedules.