GCA: Global Cellular Automata. A Flexible Parallel Model
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Using Floating-Point Arithmetic on FPGAs to Accelerate Scientific N-Body Simulations
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Multi-Pipeline Implementations of Real-Time Vector DFT
DSD '04 Proceedings of the Digital System Design, EUROMICRO Systems
FPGA Implementations of the Massively Parallel GCA Model
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 14 - Volume 15
Theory of Self-Reproducing Automata
Theory of Self-Reproducing Automata
Customization of application specific heterogeneous multi-pipeline processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
A multiprocessor architecture for the massively parallel model GCA
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The GCA-w Massively Parallel Model
PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
The massively parallel computing model GCA
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Hi-index | 0.00 |
The GCA model (Global Cellular Automata) is a massively parallel computation model which is a generalization of the Cellular Automata model. A GCA cell contains data and link information. Using the link information each cell has dynamic read access to any global cell in the field. The data and link information is updated in every generation. The GCA model is applicable and efficient for a large range of parallel algorithms (sorting, vector reduction, graph algorithms, matrix computations etc.). In order to describe algorithms for the GCA model the experimental language GCAL was developed. GCAL programs can be transformed automatically into a data parallel architecture (DPA). The paper presents for the N-body problem how the force calculation between the masses can be described in GCAL and synthesized into a data parallel architecture. At first the GCAL description of the application is transformed into a Verilog description which is inserted into a Verilog template describing the general DPA. Then the whole Verilog code is used as input for an FPGA synthesizing tool which generates the application-specific DPA. Two different DPAs are generated, a "horizontal " and a "vertical " DPA. The horizontal DPA uses 17 floating-point operators in each deep pipeline. In contrast the "vertical" DPA uses only one floating-point operation at a time out of a set of 6 floating-point operators. Both architectures are compared to resource consumption, time per cell operation and cost (logic elements * execution time). It turned out that the horizontal DPA is approximately 15 times more cost efficient than the vertical DPA.