The cache behaviour of large lazy functional programs on stock hardware

Authors:
Nicholas Nethercote;Alan Mycroft
Affiliations:
Cambridge University, United Kingdom;Cambridge University, United Kingdom
Venue:
Proceedings of the 2002 workshop on Memory system performance
Year:
2002

Citing 9
Cited 3

Cache behavior of combinator graph reduction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Caching considerations for generational garbage collection

LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
Memory system performance of programs with intensive heap allocation

ACM Transactions on Computer Systems (TOCS)
Cache performance of fast-allocating programs

FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
Using generational garbage collection to implement cache-conscious data placement

Proceedings of the 1st international symposium on Memory management
A transformation-based optimiser for Haskell

Science of Computer Programming - Special issue on the 6th European symposium on programming
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Measuring Experimental Error in Microprocessor Simulation

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The nofib Benchmark Suite of Haskell Programs

Proceedings of the 1992 Glasgow Workshop on Functional Programming

The limits of software transactional memory (STM): dissecting Haskell STM applications on a many-core environment

Proceedings of the 5th conference on Computing frontiers
Introducing the PilGRIM: a processor for executing lazy functional languages

IFL'10 Proceedings of the 22nd international conference on Implementation and application of functional languages
Collecting and exploiting cache-reuse metrics

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lazy functional programs behave differently from imperative programs and these differences extend to cache behaviour. We use hardware counters and a simple yet accurate execution cost model to analyse some large Haskell programs on the x86 architecture. The programs do not interact well with modern processors---L2 cache data miss stalls and branch misprediction stalls account for up to 60% and 32% of execution time respectively. Moreover, the program code exhibits little exploitable instruction-level parallelism.We then use simulation to pinpoint cache misses at the instruction level. With this information we apply prefetching to minimise the cost of write misses, speeding up Haskell programs by up to 22%. We conclude with more ideas for changing the Glasgow Haskell Compiler and its garbage collector to improve the cache performance of large programs.