Cache behavior of combinator graph reduction

Authors:
Philip J. Koopman, Jr.;Peter Lee;Daniel P. Siewiorek
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
1992

Citing 22
Cited 13

TIM: A simple, lazy abstract machine to execute supercombinators

Proc. of a conference on Functional programming languages and computer architecture
GRIP—A high-performance architecture for parallel graph reduction

Proc. of a conference on Functional programming languages and computer architecture
Abstract interpretation of declarative languages

Abstract interpretation of declarative languages
FLIC—a functional language intermediate code

ACM SIGPLAN Notices
Performance tradeoffs in cache design

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Exact real computer arithmetic with continued fractions

LFP '88 Proceedings of the 1988 ACM conference on LISP and functional programming
Faster combinator reduction using stock hardware

LFP '88 Proceedings of the 1988 ACM conference on LISP and functional programming
The spineless G-machine

LFP '88 Proceedings of the 1988 ACM conference on LISP and functional programming
A fresh look at combinator graph reduction

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
An architecture for combinator graph reduction

An architecture for combinator graph reduction
The spineless tagless G-machine

FPCA '89 Proceedings of the fourth international conference on Functional programming languages and computer architecture
NORMA: a graph reduction processor

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
Cache Memories

ACM Computing Surveys (CSUR)
Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs

Communications of the ACM
Threaded code

Communications of the ACM
A LISP garbage-collector for virtual-memory computer systems

Communications of the ACM
Efficient compilation of lazy evaluation

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
A compiler for lazy ML

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Super-combinators a new implementation method for applicative languages

LFP '82 Proceedings of the 1982 ACM symposium on LISP and functional programming
Experimental evaluation of on-chip microprocessor cache memories

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
The Implementation of Functional Programming Languages (Prentice-Hall International Series in Computer Science)

The Implementation of Functional Programming Languages (Prentice-Hall International Series in Computer Science)

Caching considerations for generational garbage collection

LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
Memory subsystem performance of programs using copying garbage collection

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Cache performance of garbage-collected programs

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimizing dynamically-dispatched calls with run-time type feedback

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Linear logic and permutation stacks—the Forth shall be first

ACM SIGARCH Computer Architecture News - Special issue: panel sessions of the 1991 workshop on multithreaded computers
Memory system performance of programs with intensive heap allocation

ACM Transactions on Computer Systems (TOCS)
Cache performance of fast-allocating programs

FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
Reconciling responsiveness with performance in pure object-oriented languages

ACM Transactions on Programming Languages and Systems (TOPLAS)
Static load classification for improving the value predictability of data-cache misses

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Do Object-Oriented Languages Need Special Hardware Support?

ECOOP '95 Proceedings of the 9th European Conference on Object-Oriented Programming
The cache behaviour of large lazy functional programs on stock hardware

Proceedings of the 2002 workshop on Memory system performance
Pipelined functional tree accesses and updates: scheduling, synchronization, caching and coherence

Journal of Functional Programming
Efficient shared-memory support for parallel graph reduction

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The results of cache-simulation experiments with an abstract machine for reducing combinator graphs are presented. The abstract machine, called TIGRE, exhibits reduction rates that, for similar kinds of combinator graphs on similar kinds of hardware, compare favorably with previously reported techniques. Furthermore, TIGRE maps easily and efficiently onto standard computer architectures, particularly those that allow a restricted form of self-modifying code. This provides some indication that the conventional "stored program" organization of computer systems is not necessarily an inappropriate one for functional programming language implementations.This is not to say, however, that present day computer systems are well equipped to reduce combinator graphs. In particular, the behavior of the cache memory has a significant effect on performance. In order to study and quantify this effect, trace-driven cache simulations of a TIGRE graph reducer running on a reduced instruction-set computer are conducted. The results of these simulations are presented with the following hardware-cache parameters varied: cache size, block size, associativity, memory update policy, and write-allocation policy. To begin with, the cache organization of a commercially available system is used and then the performance sensitivity with respect to variations of each parameter are measured. From the results of the simulation study, a conclusion is made that combinator-graph reduction using TIGRE runs most efficiently when using a cache memory with an allocate-on-write-miss strategy, moderately large block size (preferably with subblock placement), and copy-back memory updates.