NP-SARC: Scalable network processing in the SARC multi-core FPGA platform

  • Authors:
  • Christoforos Kachris;George Nikiforos;Vassilis Papaefstathiou;Stamatis Kavadias;Manolis Katevenis

  • Affiliations:
  • Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece;Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece;Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece;Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece;Institute of Computer Science Foundation for Research and Technology (FORTH), Heraklion, Crete, Greece

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A multicore FPGA platform with cache-integrated network interfaces (NIs) has been developed, appropriate for scalable multicores, that combine the best of two worlds - the flexibility of caches (using implicit communication) and the efficiency of scratchpad memories (using explicit communication). Furthermore, the proposed scheme provides virtualized user-level RDAM capabilities and special hardware primitives (counter, queues) for the communication and synchronization of the cores. This paper presents how the proposed architecture can be utilized in the domain of network processing applications using the hardware synchronization mechanisms. Two representatives network processing benchmarks are used; one for header processing and one for payload processing. The Multiple Reader Queue (MRQ) scheme is utilized in the case of header processing, while in the case of payload processing where transfer of bulk data is required, the user-level RDMA scheme is utilized. These applications are mapped and evaluated to an FPGA platform with up to 24 processors. The performance evaluation in the domain of network processing shows that the proposed scheme can offer low latency communication and increased programming efficiency while it also offloads the processor from the communication and synchronization processes.