Exploring the performance of split data cache schemes on superscalar processors and symmetric multiprocessors

Authors:
J. Sahuquillo;S. Petit;A. Pont;V. Milutinović
Affiliations:
Department of Computer Systems, Polytechnic University of Valencia, Valencia, Spain;Department of Computer Systems, Polytechnic University of Valencia, Valencia, Spain;Department of Computer Systems, Polytechnic University of Valencia, Valencia, Spain;Department of Computer Engineering, School of Electrical Engineering, University of Belgrade, Belgrade, Serbia, Yugoslavia
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2005

Citing 18
Cited 1

A characterization of sharing in parallel programs and its application to coherency protocol evaluation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A data cache with multiple caching strategies tuned to different types of locality

ICS '95 Proceedings of the 9th international conference on Supercomputing
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Run-time adaptive cache hierarchy management via reference analysis

Proceedings of the 24th annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Exploiting spatial locality in data caches using spatial footprints

Proceedings of the 25th annual international symposium on Computer architecture
A locality sensitive multi-module cache with explicit management

ICS '99 Proceedings of the 13th international conference on Supercomputing
Active Management of Data Caches by Exploiting Reuse Information

IEEE Transactions on Computers
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementing a cache consistency protocol

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Splitting the Data Cache: A Survey

IEEE Concurrency
The Alpha 21264 Microprocessor

IEEE Micro
Itanium 2 Processor Microarchitecture

IEEE Micro
The AMD Opteron Processor for Multiprocessor Servers

IEEE Micro
mlcache: A Flexible Multi-Lateral Cache Simulator

MASCOTS '98 Proceedings of the 6th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Improving cache performance via active management

Improving cache performance via active management

An instruction-scheduling-aware data partitioning technique for coarse-grained reconfigurable architectures

Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current technology continues providing smaller and faster transistors, so processor architects can offer more complex and functional ILP processors, because manufacturers can fit more transistors on the same chip area. As a consequence, the fraction of chip area reachable in a single clock cycle is dropping, and at the same time the number of transistors on the chip is increasing. However, problems related with power consumption and heat dissipation are worrying. This scenario is forcing processor designers to look for new processor organizations that can provide the same or more performance but using smaller sizes. This fact especially affects the on-chip cache memory design; therefore, studies proposing new smaller cache organizations while maintaining, or even increasing, the hit ratio are welcome. In this sense, the cache schemes that propose a better exploitation of data locality (bypassing schemes, prefetching techniques, victim caches, etc.) are a good example.This paper presents a data cache scheme called filter cache that splits the first level data cache into two independent organizations, and its performance is compared with two other proposals appearing in the open literature, as well as larger classical caches. To check the performance two different scenarios are considered: a superscalar processor and a symmetric multiprocessor.The obtained results show that (i) in the superscalar processor the split data caches perform similarly or better than larger conventional caches, (ii) some splitting schemes work well in multiprocessors while others work less well because of data localities, (iii) the reuse information that some split schemes incorporate for managing is also useful for designing new competitive protocols to boost performance in multiprocessors, (iv) the filter data cache achieves the best performance in both scenarios.