NP-SARC: Scalable network processing in the SARC multi-core FPGA platform

Authors:
Christoforos Kachris;George Nikiforos;Vassilis Papaefstathiou;Stamatis Kavadias;Manolis Katevenis
Affiliations:
Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece;Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece;Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece;Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece;Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2013

Citing 20
Cited 0

Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
Integration of message passing and shared memory in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Coherent network interfaces for fine-grain communication

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The iSLIP scheduling algorithm for input-queued switches

IEEE/ACM Transactions on Networking (TON)
Characterizing processor architectures for programmable network interfaces

Proceedings of the 14th international conference on Supercomputing
Smart Memories: a modular reconfigurable architecture

Proceedings of the 27th annual international symposium on Computer architecture
User-Level Network Interface Protocols

Computer
Design Tradeoffs for Embedded Network Processors

ARCS '02 Proceedings of the International Conference on Architecture of Computing Systems: Trends in Network and Pervasive Computing
Network Systems Design Using Network Processors

Network Systems Design Using Network Processors
An automated exploration framework for FPGA-based soft multiprocessor systems

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Pipelined two step iterative matching algorithms for CIOQ crossbar switches

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Comparing memory systems for chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
An ILP formulation for system-level application mapping on network processor architectures

Proceedings of the conference on Design, automation and test in Europe
On-chip communication and synchronization mechanisms with cache-integrated network interfaces

Proceedings of the 7th ACM international conference on Computing frontiers
FPGA implementation of a configurable cache/scratchpad memory with virtualized user-level RDMA capability

SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Explicit Communication and Synchronization in SARC

IEEE Micro
Network Processing in Multi-core FPGAs with Integrated Cache-Network Interface

RECONFIG '10 Proceedings of the 2010 International Conference on Reconfigurable Computing and FPGAs
Saturn: a terabit packet switch using dual round robin

IEEE Communications Magazine

Quantified Score

Hi-index	0.00

Visualization

Abstract

A multicore FPGA platform with cache-integrated network interfaces (NIs) has been developed, appropriate for scalable multicores, that combine the best of two worlds - the flexibility of caches (using implicit communication) and the efficiency of scratchpad memories (using explicit communication). Furthermore, the proposed scheme provides virtualized user-level RDAM capabilities and special hardware primitives (counter, queues) for the communication and synchronization of the cores. This paper presents how the proposed architecture can be utilized in the domain of network processing applications using the hardware synchronization mechanisms. Two representatives network processing benchmarks are used; one for header processing and one for payload processing. The Multiple Reader Queue (MRQ) scheme is utilized in the case of header processing, while in the case of payload processing where transfer of bulk data is required, the user-level RDMA scheme is utilized. These applications are mapped and evaluated to an FPGA platform with up to 24 processors. The performance evaluation in the domain of network processing shows that the proposed scheme can offer low latency communication and increased programming efficiency while it also offloads the processor from the communication and synchronization processes.