Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology

Authors:
Amit Agarwal;Kaushik Roy;T. N. Vijaykumar
Affiliations:
Purdue University;Purdue University;Purdue University
Venue:
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Year:
2003

Citing 3
Cited 9

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Digital integrated circuits: a design perspective

Digital integrated circuits: a design perspective
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach

Wire Delay is Not a Problem for SMT (In the Near Future)

Proceedings of the 31st annual international symposium on Computer architecture
Effective Instruction Prefetching via Fetch Prestaging

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

Proceedings of the 33rd annual international symposium on Computer Architecture
Reducing cache misses through programmable decoders

ACM Transactions on Architecture and Code Optimization (TACO)
Variable latency caches for nanoscale processor

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Adaptive data compression for high-performance low-power on-chip networks

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
APC: a performance metric of memory systems

ACM SIGMETRICS Performance Evaluation Review
AVICA: an access-time variation insensitive L1 cache architecture

Proceedings of the Conference on Design, Automation and Test in Europe
Variation trained drowsy cache (VTD-Cache): a history trained variation aware drowsy cache for fine grain voltage scaling

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a design technique to pipeline cache memories for high bandwidth applications. With the scaling of technology cache access latencies are multiple clock cycles. The proposed pipelined cache architecture can be accessed every clock cycle and thereby, enhances bandwidth and overall processor performance. The proposed architecture utilizes the idea of banking to reduce bit-line and word-line delay, making word-line to sense amplifier delay to fit into a single clock cycle. Experimental results show that optimal banking allows the cache to be split into multiple stages whose delays are equal to clock cycle time. The proposed design is fully scalable and can be applied to future technology generations. Power, delay and area estimates show that on average, the proposed pipelined cache improves MOPS (millions of operations per unit time per unit area per unit energy) by 40-50% compared to current cache architectures.