Data cache performance of supercomputer applications

Authors:
David Callahan;Allan Porterfield
Affiliations:
Tera Computer Company, 400 N 34th St., Suite 300, Seattle, WA;Tera Computer Company, 400 N 34th St., Suite 300, Seattle, WA
Venue:
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Year:
1990

Citing 6
Cited 16

Performance of various computers using standard linear equations software in a FORTRAN environment

ACM SIGARCH Computer Architecture News
Cache performance of vector processors

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Performance tradeoffs in cache design

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Performance comparison of the Cray-2 and Cray X-MP/416 supercomputers

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Cache Memories

ACM Computing Surveys (CSUR)
Software methods for improvement of cache performance on supercomputer applications

Software methods for improvement of cache performance on supercomputer applications

Software prefetching

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Pseudo vector processor based on register-windowed superscalar pipeline

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
A scalar architecture for pseudo vector processing based on slide-windowed registers

ICS '93 Proceedings of the 7th international conference on Supercomputing
The effectiveness of caches for vector processors

ICS '94 Proceedings of the 8th international conference on Supercomputing
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Predictability of load/store instruction latencies

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
CP-PACS: a massively parallel processor for large scale scientific calculations

ICS '97 Proceedings of the 11th international conference on Supercomputing
Tolerating latency in multiprocessors through compiler-inserted prefetching

ACM Transactions on Computer Systems (TOCS)
Software Controlled Reconfigurable On-Chip Memory for High Performance Computing

IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
The Architecture of Massively Parallel Processor CP-PACS

PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
SCIMA: Software Controlled Integrated Memory Architecture for High Performance Computing

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
SCIMA: A Novel Architecture for High Performance Computing

IWIA '99 Proceedings of the 1999 International Workshop on Innovative Architecture
A Quantitative Analysis of Tile Size Selection Algorithms

The Journal of Supercomputing
Application analysis using memory pressure

Proceedings of the 2005 workshop on Memory system performance
Unfavorable Strides in Cache Memory Systems (RNR Technical Report RNR-92-015)

Scientific Programming
Optimized dense matrix multiplication on a many-core architecture

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Processor speed has been increasing faster than mass memory speed. One method of matching a processor's speed to memory's is high-speed caches. This paper examines the data cache performance of a set of computationally intensive programs. Our interset in measuring cache performance arises from an interest in improving the performance of program during compilation. We observed that the data caches contained the values for between 45% and 99+% of the array accesses, depending on the cache and the program. The delays from the misses accounted for up to half of the total execution time of the program. The misses were grouped in a subset of source program references which resulted in misses on every access. Aggressive compilers should be able to improve program performance by focusing on those array accesses that result in cache misses.