ATUM: a new technique for capturing address traces using microcode
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Line (block) size choice for CPU cache memories
IEEE Transactions on Computers
Analysis of cache performance for operating systems and multiprogramming
Analysis of cache performance for operating systems and multiprogramming
Performance tradeoffs in cache design
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Cache and memory hierarchy design: a performance-directed approach
Cache and memory hierarchy design: a performance-directed approach
Sequentiality and prefetching in database systems
ACM Transactions on Database Systems (TODS)
Characterizing the Storage Process and Its Effect on the Update of Main Memory by Write Through
Journal of the ACM (JACM)
Cache evaluation and the impact of workload choice
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Bibliography and reading on CPU cache memories and related topics
ACM SIGARCH Computer Architecture News
The effect of instruction fetch strategies upon the performance of pipelined instruction units
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Sequential prefetch strategies for instructions and data
Sequential prefetch strategies for instructions and data
The Memory Architecture and the Cache and Memory Management Unit for
The Memory Architecture and the Cache and Memory Management Unit for
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
A study of I/O system organizations
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Evaluating performance of prefetching second level caches
ACM SIGMETRICS Performance Evaluation Review
A unified architectural tradeoff methodology
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hardware implementation issues of data prefetching
ICS '95 Proceedings of the 9th international conference on Supercomputing
Architecture Technique Trade-Offs Using Mean Memory Delay Time
IEEE Transactions on Computers
Trap-driven memory simulation with Tapeworm II
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Run-time spatial locality detection and optimization
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exploiting spatial locality in data caches using spatial footprints
Proceedings of the 25th annual international symposium on Computer architecture
Functional Implementation Techniques for CPU Cache Memories
IEEE Transactions on Computers - Special issue on cache memory and related problems
The pool of subsectors cache design
ICS '99 Proceedings of the 13th international conference on Supercomputing
ACM Computing Surveys (CSUR)
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Designing a Modern Memory Hierarchy with Hardware Prefetching
IEEE Transactions on Computers
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared Memory Multiprocessor
IEEE Transactions on Parallel and Distributed Systems
Minerva: An Adaptive Subblock Coherence Protocol for Improved SMP Performance
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A new NAND-type flash memory package with smart buffer system for spatial and temporal localities
Journal of Systems Architecture: the EUROMICRO Journal
Reducing Cache Pollution via Dynamic Data Prefetch Filtering
IEEE Transactions on Computers
Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Increasing cache capacity through word filtering
Proceedings of the 21st annual international conference on Supercomputing
Next high performance and low power flash memory package structure
Journal of Computer Science and Technology
Application-Specific hardware-driven prefetching to improve data cache performance
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.01 |
This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance. It has been previously noted that given a simple fetch strategy the performance optimal block size is almost always four or eight words [10]. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance-optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of two greater than the data cache fetch size, which in turn should equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated here, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.