Skewed associativity enhances performance predictability

Authors:
François Bodin;André Seznec
Affiliations:
IRISA-INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France;IRISA-INRIA, Campus de Beaulieu, 35042 Rennes Cedex, France
Venue:
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Year:
1995

Citing 8
Cited 12

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A case for two-way skewed-associative caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A strategy for array management in local memory

Mathematical Programming: Series A and B
Skewed associativity enhances performance predictability

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Skewed-associative Caches

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe

Skewed associativity enhances performance predictability

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
An Efficient Indirect Branch Predictor

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
MORPH: a system architecture for robust high performance using customization (an NSF 100 TeraOps point design study)

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Real time aspects of cluster based caches

RTCSA '95 Proceedings of the 2nd International Workshop on Real-Time Computing Systems and Applications
A 2-Way Thrashing-Avoidance Cache (TAC): An Efficient Instruction Cache Scheme for Object-Oriented Languages

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Highly accurate and efficient evaluation of randomising set index functions

Journal of Systems Architecture: the EUROMICRO Journal
A case for a working-set-based memory hierarchy

Proceedings of the 2nd conference on Computing frontiers
Adaptive Caches: Effective Shaping of Cache Behavior to Workloads

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
YAARC: yet another approach to further reducing the rate of conflict misses

The Journal of Supercomputing
The ZCache: Decoupling Ways and Associativity

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance tuning becomes harder as computer technology advances. One of the factors is the increasing complexity of memory hierarchies. Most modern machines now use at least one level of cache memory. To reduce execution stalls, cache misses must be very low. Software techniques used to improve locality have been developped for numerical codes, such as loop blocking and copying. Unfortunately, the behavior of direct mapped and set associative caches is still erratic when large numerical data is accessed. Execution time can vary drasticly for the same loop kernel depending on uncontrolled factors such as array leading size. The only software method available to improve execution time stability is the copying of frequently used data, which is costly in execution time. Users are not usually cache organisation experts. They are not aware of such phenomena, and have no control over it.In this paper, we show that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms. As a result, execution time is faster and much more predictable than with conventional caches. As a result of its better comportment, it is possible to use larger blocks sizes with blocked algorithms, which will furthermore reduces blocking overhead costs.