Cellular Automata Simulations on a FPGA cluster

Authors:
S. Murtaza;A.G. Hoekstra;P.M.A. Sloot
Affiliations:
University of Amsterdam;University of Amsterdam;University of Amsterdam
Venue:
International Journal of High Performance Computing Applications
Year:
2011

Citing 15
Cited 0

Design Challenges of Technology Scaling

IEEE Micro
Cellular Automata as a Mesoscopic Approach to Model and Simulate Complex Systems

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Cellular Automata: A Discrete Universe

Cellular Automata: A Discrete Universe
A Cellular Automata System with FPGA

FCCM '01 Proceedings of the the 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Parallel application performance on shared high performance reconfigurable computing resources

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Electronics beyond nano-scale CMOS

Proceedings of the 43rd annual Design Automation Conference
Achieving High Performance with FPGA-Based Computing

Computer
Thousand core chips: a technology perspective

Proceedings of the 44th annual Design Automation Conference
Maxwell - a 64 FPGA Supercomputer

AHS '07 Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems
The FPGA High-Performance Computing Alliance Parallel Toolkit

AHS '07 Proceedings of the Second NASA/ESA Conference on Adaptive Hardware and Systems
An Assessment of Integrated Digital Cellular Automata Architectures

Computer
The Promise of High-Performance Reconfigurable Computing

Computer
Performance analysis challenges and framework for high-performance reconfigurable computing

Parallel Computing
Compute Bound and I/O Bound Cellular Automata Simulations on FPGA Logic

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A view of the parallel computing landscape

Communications of the ACM - A View of Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The emergence of multicore architectures and the chip industryâ聙聶s plan to roll out hundreds of cores per die sometime in the near future might have triggered the evolution of von Neumann architectures towards a parallel processing paradigm. The capability to have hundreds of cores per die is exciting, but how optimally we are able to utilize such a resource remains a challenge. Since there are no straightforward solutions we seek inspiration from relevant scientific processes. Cellular automata which are inherently decentralized and spatially extended structures provide a potential candidate among parallel processing alternatives. The availability of spatial parallelism on field programmable gate arrays make them the ideal platform to investigate cellular automata systems as potential parallel processing paradigms on multicore architectures. This article presents a massively parallel implementation for a floating-point-based cellular automata using special purpose hardware such as Field Programmable Gate Array (FPGAs). The challenge is to best map an application to the underlying many-core architecture and address issues such as inter-core communication, scalability, and flexibility both in terms of hardware and software. Maxwell â聙聰 a 64-node FPGA supercomputer, is used for accelerator implementations that range from a single to a multiple FPGA-enabled system. A performance model is proposed and demonstrated to closely reproduce measured execution times. The performance model enables identification of the main sources of overhead and suggests improvements to the architecture and implementation of the lattice Boltzmann method and compute-bound cellular automata in general. Further, a 2 million cell 2DQ9 lattice Boltzmann method lattice with periodic boundary conditions, simulated using a multiple FPGA chip accelerator implementation, is presented. The performance model shows how the FPGA-enabled PC cluster is the preferred multiple FPGA organization over the multiple FPGA-based PC setup. Latency hiding is fully exploited for PC cluster-based system implementations and demonstrated using system profiling.