Deforestation: transforming programs to eliminate trees
Proceedings of the Second European Symposium on Programming
FPCA '93 Proceedings of the conference on Functional programming languages and computer architecture
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Lisp and Symbolic Computation - Special issue on state in programming languages (part I)
Shortcut deforestation in calculational form
FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
Pracniques: further remarks on reducing truncation errors
Communications of the ACM
Shortcut fusion for accumulating parameters & zip-like functions
Proceedings of the seventh ACM SIGPLAN international conference on Functional programming
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Unboxed Values as First Class Citizens in a Non-Strict Functional Language
Proceedings of the 5th ACM Conference on Functional Programming Languages and Computer Architecture
C--: A Portable Assembly Language that Supports Garbage Collection
PPDP '99 Proceedings of the International Conference PPDP'99 on Principles and Practice of Declarative Programming
Extending Higher-Order Deforestation: Transforming Programs to Eliminate Even More Trees
SFP '01 Selected papers from the 3rd Scottish Functional Programming Workshop (SFP01)
Deforestation for Higher-Order Functions
Proceedings of the 1992 Glasgow Workshop on Functional Programming
Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Making a fast curry: push/enter vs. eval/apply for higher-order languages
Journal of Functional Programming
Data parallel Haskell: a status report
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Stream fusion: from lists to streams to nothing at all
ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Call-pattern specialisation for Haskell programs
ICFP '07 Proceedings of the 12th ACM SIGPLAN international conference on Functional programming
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
High-performance implementation of the level-3 BLAS
ACM Transactions on Mathematical Software (TOMS)
Harnessing the Multicores: Nested Data Parallelism in Haskell
APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Short cut fusion: proved and improved
SAIG'01 Proceedings of the 2nd international conference on Semantics, applications, and implementation of program generation
Proceedings of the third ACM Haskell symposium on Haskell
Regular, shape-polymorphic, parallel arrays in Haskell
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Efficient parallel stencil convolution in Haskell
Proceedings of the 4th ACM symposium on Haskell
Guiding parallel array fusion with indexed types
Proceedings of the 2012 Haskell Symposium
Optimising purely functional GPU programs
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
Automatic SIMD vectorization for Haskell
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
The HERMIT in the stream: fusing stream fusion's concatMap
Proceedings of the ACM SIGPLAN 2014 Workshop on Partial Evaluation and Program Manipulation
Hi-index | 0.00 |
Stream fusion is a powerful technique for automatically transforming high-level sequence-processing functions into efficient implementations. It has been used to great effect in Haskell libraries for manipulating byte arrays, Unicode text, and unboxed vectors. However, some operations, like vector append, still do not perform well within the standard stream fusion framework. Others, like SIMD computation using the SSE and AVX instructions available on modern x86 chips, do not seem to fit in the framework at all. In this paper we introduce generalized stream fusion, which solves these issues. The key insight is to bundle together multiple stream representations, each tuned for a particular class of stream consumer. We also describe a stream representation suited for efficient computation with SSE instructions. Our ideas are implemented in modified versions of the GHC compiler and vector library. Benchmarks show that high-level Haskell code written using our compiler and libraries can produce code that is faster than both compiler- and hand-vectorized C.