Implementing Parallel Google Map-Reduce in Eden
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
FPMR: MapReduce framework on FPGA
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Axel: a heterogeneous cluster with FPGAs and GPUs
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
High-Performance Quasi-Monte Carlo Financial Simulation: FPGA vs. GPP vs. GPU
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Combining optimizations in automated low power design
Proceedings of the Conference on Design, Automation and Test in Europe
The RLOC is dead - long live the RLOC
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface
International Journal of High Performance Systems Architecture
Automated Mapping of the MapReduce Pattern onto Parallel Computing Platforms
Journal of Signal Processing Systems
Mapping a data-flow programming model onto heterogeneous platforms
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
The map-reduce model requires users to express their problem in terms of a map function that processes single records in a stream, and a reduce function that merges all mapped outputs to produce a final result. By exposing structural similarity in this way, a number of key issues associated with the design of custom computing machines including parallelisation; design complexity; software-hardware partitioning; hardware-dependency, portability and scalability can be easily addressed. We present an implementation of a map-reduce library supporting parallel field programmable gate arrays (FPGAs) and graphics processing units (GPUs). Parallelisation due to pipelining, multiple data paths and concurrent execution of FPGA/GPU hardware is automatically achieved. Users first specify the map and reduce steps for the problem in ANSI Cand no knowledge of the underlying hardware or parallelisation is needed. The source code is then manually translated into a pipelined data path which, along with the map-reduce library, is compiled into appropriate binary configurations for the processing units. We describe our experience in developing a number of benchmark problems in signal processing, Monte Carlo simulation and scientific computing as well as report on the performance of FPGA, GPU and hetereogeneous systems.