Skewed Associativity Improves Program Performance and Enhances Predictability

Authors:
François Bodin;André Seznec
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1997

Citing 7
Cited 15

Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A case for two-way skewed-associative caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A strategy for array management in local memory

Mathematical Programming: Series A and B
Skewed-associative Caches

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe

Analytical Modeling of Set-Associative Cache Behavior

IEEE Transactions on Computers
Cost-Effective Flow Table Designs for High-Speed Routers: Architecture and Performance Evaluation

IEEE Transactions on Computers
An energy efficient cache memory architecture for embedded systems

Proceedings of the 2004 ACM symposium on Applied computing
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Eliminating Conflict Misses Using Prime Number-Based Cache Indexing

IEEE Transactions on Computers
Skewed caches from a low-power perspective

Proceedings of the 2nd conference on Computing frontiers
XOR-Based Hash Functions

IEEE Transactions on Computers
Application-specific reconfigurable XOR-indexing to eliminate cache conflict misses

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Using Indexing Functions to Reduce Conflict Aliasing in Branch Prediction Tables

IEEE Transactions on Computers
YAARC: yet another approach to further reducing the rate of conflict misses

The Journal of Supercomputing
Design of new XOR-based hash functions for cache memories

Computers & Mathematics with Applications
A low-power cache scheme for embedded computing

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Off-loading application controlled data prefetching in numerical codes for multi-core processors

International Journal of Computational Science and Engineering
A case for dual-mapping one-way caches

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Target encoding for efficient indirect jump prediction

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	15.00

Visualization

Abstract

Performance tuning becomes harder as computer technology advances. One of the factors is the increasing complexity of memory hierarchies. Most modern machines now use at least one level of cache memory. To reduce execution stalls, cache misses must be very low. Software techniques used to improve locality have been developed for numerical codes, such as loop blocking and copying. Unfortunately, the behavior of direct mapped and set associative caches is still erratic when large data arrays are accessed. Execution time can vary drastically for the same loop kernel depending on uncontrolled factors such as array leading size. The only software method available to improve execution time stability is the copying of frequently used data, which is costly in execution time. Users are not usually cache organization experts. They are not aware of such phenomena and have no control over it.In this paper, we show that the recently proposed four-way skewed associative cache yields very stable execution times and good average miss ratios on blocked algorithms. As a result, execution time is faster and much more predictable than with conventional caches. It is therefore possible to use larger block sizes in blocked algorithms, which will further reduce blocking overhead costs.