Digital integrated circuits: a design perspective
Digital integrated circuits: a design perspective
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Neon: a single-chip 3D workstation graphics accelerator
HWWS '98 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Prefetching in a texture cache architecture
HWWS '98 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
A scalable front-end architecture for fast instruction delivery
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
Micro-operation cache: a power aware frontend for the variable instruction length ISA
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The Alpha 21264 Microprocessor
IEEE Micro
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
Using a serial cache for energy efficient instruction fetching
Journal of Systems Architecture: the EUROMICRO Journal
Effective Instruction Prefetching via Fetch Prestaging
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Energy-efficient and high-performance instruction fetch using a block-aware ISA
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Wide and efficient trace prediction using the local trace predictor
Proceedings of the 20th annual international conference on Supercomputing
Guaranteeing instruction fetch behavior with a lookahead instruction fetch engine (LIFE)
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An overview of achieving energy efficiency in on-chip networks
International Journal of Communication Networks and Distributed Systems
Hi-index | 0.00 |
Energy efficient architecture research has flourished recently, in an attempt to address packaging and cooling concerns of current microprocessor designs, as well as battery life for mobile computers. Moreover, architects have become increasingly concerned with the complexity of their designs in the face of scalability, verification, and manufacturing concerns.In this paper, we propose and evaluate a high performance, energy and complexity efficient front-end prefetch architecture. This design, called Serial Prefetching, combines a high fetch bandwidth branch prediction and efficient instruction prefetching architecture with a low-energy instruction cache. Serial Prefetching explores the benefit of decoupling the tag component of the cache from the data component. Cache blocks are first verified by the tag component of the cache, and then the accesses are put into a queue to be consumed by the data component of the instruction cache. Energy is saved by only accessing the correct way of the data component specified by the tag lookup in a previous cycle. The tag component does not stall on a I-cache miss, only the data component. The accesses that miss in the tag component are speculatively brought in from lower levels of the memory hierarchy. This in effect performs a prefetch, while the access migrates through the queue to be consumed by the data component.