A network-centric approach to space-restricted distributed processing
Microprocessors & Microsystems
Computational bit-width allocation for operations in vector calculus
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Bit-width allocation for hardware accelerators for scientific computing using SAT-modulo theory
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Finite precision bit-width allocation using SAT-modulo theory
Proceedings of the Conference on Design, Automation and Test in Europe
A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Floating-Point Exponentiation Units for Reconfigurable Computing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.00 |
With advances in reconfigurable hardware, especially field-programmable gate arrays (FPGAs), it has become possible to use reconfigurable hardware to accelerate complex applications, such as those in scientific computing. There has been a resulting development of reconfigurable computers--computers which have both general purpose processors and reconfigurable hardware, as well as memory and high-performance interconnection networks. In this paper, we describe the acceleration of molecular dynamics simulation with reconfigurable computers. We evaluate several design alternatives for the reconfigurable computer implementation. We show that a single node accelerated with reconfigurable hardware--utilizing fine-grained parallelism in the reconfigurable hardware design--is able to achieve a speed-up of about 2X over the corresponding software-only simulation. We then parallelize the application and study the effect of acceleration on performance and scalability. Specifically, we study strong scaling in which the problem size is fixed. We find that the unaccelerated version actually scales better because it spends more time in computation than the accelerated version does. However, we also find that a cluster of P accelerated nodes gives better performance than a cluster of 2P unaccelerated nodes.