FFTs in external of hierarchical memory
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Powerlist: a structure for parallel recursion
ACM Transactions on Programming Languages and Systems (TOPLAS)
Programming parallel algorithms
Communications of the ACM
Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Programming with Divide-and-Conquer Skeletons: A Case Study of FFT
The Journal of Supercomputing
Automatic parallelization of divide and conquer algorithms
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic parallelization of recursive procedures
International Journal of Parallel Programming - Special issue on parallel architectures and compilation techniques
Introduction to Algorithms
MPI Microtask for programming the cell broadband engineTM processor
IBM Systems Journal
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers
IBM Journal of Research and Development
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Programmable data dependencies and placements
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
Fine grained parallelism in recursive function calls
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II
Hi-index | 0.00 |
We present Huckleberry, a tool for automatically generating parallel implementations for multi-core platforms from sequential recursive divide-and-conquer programs. The recursive programming model is a good match for parallel systems because it highlights the temporal and spatial locality of data use. Recursive algorithms are used by Huckleberry's code generator not only to automatically divide a problem up into smaller tasks, but also to derive lower-level parts of the implementation, such as data distribution and inter-core synchronization mechanisms. We apply Huckleberry to a multi-core platform based on the Cell BE processor and show how it generates parallel code for a variety of sequential benchmarks.