High-performance architecture for dynamically updatable packet classification on FPGA

  • Authors:
  • Yun R. Qu;Shijie Zhou;Viktor K. Prasanna

  • Affiliations:
  • University of Southern California, Los Angeles, California, USA;University of Southern California, Los Angeles, California, USA;University of Southern California, Los Angeles, California, USA

  • Venue:
  • ANCS '13 Proceedings of the ninth ACM/IEEE symposium on Architectures for networking and communications systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Algorithms and FPGA based implementations for packet classification have been studied over the past decade. Algorithmic solutions have focused on high throughput; however, supporting dynamic updates has been challenging. In this paper, we present a 2-dimensional pipelined architecture for packet classification on FPGA, which achieves high throughput while supporting dynamic updates. Fine grained processing elements are arranged in a 2-dimensional array; each processing element accesses its designated memory locally, resulting in a scalable architecture. The entire array is both horizontally and vertically pipelined. As a result, it supports high clock rate that does not deteriorate as the length of the packet header or the size of the rule set increases. The performance of the architecture does not depend on rule set features such as the number of unique values in each field. The architecture also efficiently supports range searches in individual fields. The total memory is proportional to the rule set size. Dynamic updates' modify, delete and insert operations for the rule set during run-time are also supported on the self-reconfigurable processing elements with very little impact on the sustained throughput. Experimental results show that, for a 1K 15-tuple rule set, a state-of-the-art FPGA can sustain 190 Gbps throughput with 1 million updates/second. To the best of our knowledge, we are not aware of any packet classification approach that simultaneously supports both high throughput and dynamic updates of the rule set. Our architecture demonstrates 4x energy efficiency while achieving 2x throughput compared to TCAM.