Cache behavior of combinator graph reduction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Caching considerations for generational garbage collection
LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
Memory system performance of programs with intensive heap allocation
ACM Transactions on Computer Systems (TOCS)
Cache performance of fast-allocating programs
FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
A transformation-based optimiser for Haskell
Science of Computer Programming - Special issue on the 6th European symposium on programming
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Measuring Experimental Error in Microprocessor Simulation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The nofib Benchmark Suite of Haskell Programs
Proceedings of the 1992 Glasgow Workshop on Functional Programming
Proceedings of the 5th conference on Computing frontiers
Introducing the PilGRIM: a processor for executing lazy functional languages
IFL'10 Proceedings of the 22nd international conference on Implementation and application of functional languages
Collecting and exploiting cache-reuse metrics
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Hi-index | 0.00 |
Lazy functional programs behave differently from imperative programs and these differences extend to cache behaviour. We use hardware counters and a simple yet accurate execution cost model to analyse some large Haskell programs on the x86 architecture. The programs do not interact well with modern processors---L2 cache data miss stalls and branch misprediction stalls account for up to 60% and 32% of execution time respectively. Moreover, the program code exhibits little exploitable instruction-level parallelism.We then use simulation to pinpoint cache misses at the instruction level. With this information we apply prefetching to minimise the cost of write misses, speeding up Haskell programs by up to 22%. We conclude with more ideas for changing the Glasgow Haskell Compiler and its garbage collector to improve the cache performance of large programs.