Filtering algorithms and implementation for very fast publish/subscribe systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The many faces of publish/subscribe
ACM Computing Surveys (CSUR)
Computing in Science and Engineering
Maxwell - a 64 FPGA Supercomputer
AHS '07 Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems
The FPGA High-Performance Computing Alliance Parallel Toolkit
AHS '07 Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems
Phoenix: A Runtime Environment for High Performance Computing on Chip Multiprocessors
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Accelerating Quadrature Methods for Option Valuation
FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Axel: a heterogeneous cluster with FPGAs and GPUs
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
High-Performance Quasi-Monte Carlo Financial Simulation: FPGA vs. GPP vs. GPU
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
FLAT: a GPU programming framework to provide embedded MPI
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
We describe a programming framework for high performance clusters with various hardware accelerators. In this framework, users can utilize the available heterogeneous resources productively and efficiently. The distributed application is highly modularized to support dynamic system configuration with changing types and number of the accelerators. Multiple layers of communication interface are introduced to reduce the overhead in both control messages and data transfers. Parallelism can be achieved by controlling the accelerators in various schemes through scheduling extension. The framework has been used to support physics simulation and financial application development. We achieve significant performance improvement on a 16-node cluster with FPGA and GPU accelerators.