Scalable mpNoC for massively parallel systems - Design and implementation on FPGA

Authors:
M. Baklouti;Y. Aydi;Ph. Marquet;J. L. Dekeyser;M. Abid
Affiliations:
Computer Embedded Systems (CES), Univ. Sfax, National School of Engineers (ENIS), BP 1173, Sfax 3038, Tunisia and Univ. Lille, F-59044 Villeneuve d'ascq, France and LIFL, Univ. Lille 1, F-59650 Vi ...;Computer Embedded Systems (CES), Univ. Sfax, National School of Engineers (ENIS), BP 1173, Sfax 3038, Tunisia;Univ. Lille, F-59044 Villeneuve d'ascq, France and LIFL, Univ. Lille 1, F-59650 Villeneuve d'ascq, France and INRIA Lille Nord Europe, F-59650 Villeneuve d'ascq, France and UMR 8022, CNRS, F-59650 ...;Univ. Lille, F-59044 Villeneuve d'ascq, France and LIFL, Univ. Lille 1, F-59650 Villeneuve d'ascq, France and INRIA Lille Nord Europe, F-59650 Villeneuve d'ascq, France and UMR 8022, CNRS, F-59650 ...;Computer Embedded Systems (CES), Univ. Sfax, National School of Engineers (ENIS), BP 1173, Sfax 3038, Tunisia
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2010

Citing 23
Cited 0

A Unified theory of interconnection network structure

Theoretical Computer Science
The Distribution of Waiting Times in Clocked Multistage Interconnection Networks

IEEE Transactions on Computers
The connection machine

The connection machine
Convolution on Mesh Connected Multicomputers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Design and Implementation of the MorphoSys Reconfigurable ComputingProcessor

Journal of VLSI Signal Processing Systems - Special issue on VLSI on custom computing technology
Introduction to Parallel Processing: Algorithms and Architectures

Introduction to Parallel Processing: Algorithms and Architectures
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems

Parallel Architectures and Parallel Algorithms for Integrated Vision Systems
Networks on Chips: A New SoC Paradigm

Computer
A Massively Parallel Architecture for Linear Machine Code Genetic Programming

ICES '01 Proceedings of the 4th International Conference on Evolvable Systems: From Biology to Hardware
An emulator network for SIMD machine interconnection networks

ISCA '79 Proceedings of the 6th annual symposium on Computer architecture
Study of multistage SIMD interconnection networks

ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
Interactive Ray Tracing on Reconfigurable SIMD MorphoSys

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum - Volume 2
Massively parallel processing on a chip

Proceedings of the 4th international conference on Computing frontiers
Interconnection Networks for SIMD Machines

Computer
Parallel Processing with the Perfect Shuffle

IEEE Transactions on Computers
Analysis and Simulation of Buffered Delta Networks

IEEE Transactions on Computers
The Indirect Binary n-Cube Microprocessor Array

IEEE Transactions on Computers
Performance of Processor-Memory Interconnections for Multiprocessors

IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor

IEEE Transactions on Computers
The Performance of Multistage Interconnection Networks for Multiprocessors

IEEE Transactions on Computers
Evaluating On-Chip Interconnection Architectures for Parallel Processing

CSEWORKSHOPS '08 Proceedings of the 2008 11th IEEE International Conference on Computational Science and Engineering - Workshops
Editorial: design and architectures for signal and image processing

EURASIP Journal on Embedded Systems - Special issue on design and architectures for signal and image processing
A low cost and adaptable routing network for reconfigurable systems

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The high chip-level integration enables the implementation of large-scale parallel processing architectures with 64 and more processing nodes on a single chip or on an FPGA device. These parallel systems require a cost-effective yet high-performance interconnection scheme to provide the needed communications between processors. The massively parallel Network on Chip (mpNoC) was proposed to address the demand for parallel irregular communications for massively parallel processing System on Chip (mppSoC). Targeting FPGA-based design, an efficient mpNoC low level RTL implementation is proposed taking into account design constraints. The proposed network is designed as an FPGA based Intellectual Property (IP) able to be configured in different communication modes. It can communicate between processors and also perform parallel I/O data transfer which is clearly a key issue in an SIMD system. The mpNoC RTL implementation presents good performances in terms of area, throughput and power consumption which are important metrics targeting an on chip implementation. mpNoC is a flexible architecture that is suitable for use in FPGA-based parallel systems. This paper introduces the basic mppSoC architecture. It mainly focuses on the mpNoC flexible IP based design and its implementation on FPGA. The integration of mpNoC in mppSoC is also described. Implementation results on a Stratix II FPGA device are given for three data-parallel applications ran on mppSoC. The obtained good performances justify the effectiveness of the proposed parallel network. It is shown that the mpNoC is a lightweight parallel network making it suitable for both small as well as large FPGA-based parallel systems.