Improving performance of small on-chip instruction caches

Authors:
M. K. Farrens;a. R. Pleszkun
Affiliations:
Computer Sciences Department, University of Wisconsin-Madison, Madison, WI;Department of Electrical and Computer Engineering, University of Colorado-Boulder, Boulder, CO
Venue:
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Year:
1989

Citing 7
Cited 14

An instruction cache design for use with a delayed branch

Proceedings of the fourth MIT conference on Advanced research in VLSI
PIPE: a VLSI decoupled architecture

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Cache Memories

ACM Computing Surveys (CSUR)
Hardware/software tradeoffs for increased performance

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The effect of instruction fetch strategies upon the performance of pipelined instruction units

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture

Implementation of the PIPE Processor

Computer - Special issue on experimental research in computer architecture
Classification and performance evaluation of instruction buffering techniques

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
On reconfigurable on-chip data caches

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
SPIRE: streaming processing with instructions release element

ACM SIGARCH Computer Architecture News
Memory latency effects in decoupled architectures with a single data memory module

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Optimal allocation of on-chip memory for multiple-API operating systems

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Instruction fetching: coping with code bloat

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
SPAID: software prefetching in pointer- and call-intensive environments

Proceedings of the 28th annual international symposium on Microarchitecture
Wrong-path instruction prefetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
An evaluation of functional unit lengths for single-chip processors

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Improving direct-mapped cache performance by the addition of a small fully-associative cache prefetch buffers

25 years of the international symposia on Computer architecture (selected papers)
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory Latency Effects in Decoupled Architectures

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most current single-chip processors employ an on-chip instruction cache to improve performance. A miss in this instruction cache will cause an external memory reference which must compete with data references for access to the external memory, thus affecting the overall performance of the processor. One common way to reduce the number of off-chip instruction requests is to increase the size of the on-chip cache. An alternative approach is presented in this paper, in which a combination of an instruction cache, instruction queue and instruction queue buffer is used to achieve the same effect with a much smaller instruction cache size. Such an approach is significant for emerging technologies where high circuit densities are initially difficult to achieve yet a high level of performance is desired, or for more mature technologies where chip area can be used to provide more functionality. The viability of this approach is demonstrated by its implementation in an existing single-chip processor.