Prefetching for improved bus wrapper performance in cores

Authors:
Roman Lysecky;Frank Vahid
Affiliations:
University of California, Riverside, CA;University of California, Riverside, and University of California, Irvine, CA
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2002

Citing 14
Cited 4

Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Real-time systems and their programming languages

Real-time systems and their programming languages
Interface co-synthesis techniques for embedded systems

ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Constructing application-specific heterogeneous embedded architectures from custom HW/SW applications

DAC '96 Proceedings of the 33rd annual Design Automation Conference
Interface-based design

DAC '97 Proceedings of the 34th annual Design Automation Conference
The case for a configure-and-execute paradigm

CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Fast prototyping: a system design flow applied to a complex system-on-chip multiprocessor design

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Description and simulation of hardware/software systems with Java

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Java driven codesign and prototyping of networked embedded systems

Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
A New Direction for Computer Architecture Research

Computer
Interface Design for Core-Based Systems

IEEE Design & Test
An Object-Oriented Communication Library for Hardware-Software CoDesign

CODES '97 Proceedings of the 5th International Workshop on Hardware/Software Co-Design
Bus-Based Communication Synthesis on System-Level

ISSS '96 Proceedings of the 9th international symposium on System synthesis

Cluster miss prediction for instruction caches in embedded networking applications

Proceedings of the 14th ACM Great Lakes symposium on VLSI
A combined DMA and application-specific prefetching approach for tackling the memory latency bottleneck

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Wrapping of Soft IPs for Interface-based Design Using Heterogeneous Metaprogramming

Informatica
A compiler intermediate representation for reconfigurable fabrics

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reuse of cores can reduce design time for systems-on-a-chip. Such reuse is dependent on being able to easily interface a core to any bus. To enable such interfacing, many propose separating a core's interface from its internals by using a bus wrapper. However, this separation can lead to a performance penalty when reading a core's internal registers. In this paper, we introduce prefetching, which is analogous to caching, as a technique to reduce or eliminate this performance penalty, involving a tradeoff with power and size. We describe the prefetching technique, classify different types of registers, describe our initial prefetching architectures and heuristics for certain classes of registers, and highlight experiments demonstrating the performance improvements and size/power tradeoffs. We further introduce a technique for automatically designing a prefetch unit that satisfies user-imposed register-access constraints. The technique benefits from mapping the prefetching problem to the well-known real-time process scheduling problem. We then extend the technique to allow user-specified register interdependencies, using a Petri net model, resulting in even more efficient prefetch schedules.