Automatic data partitioning for the agere payload plus network processor

Authors:
Steve Carr;Philip Sweany
Affiliations:
Michigan Technological University, Houghton, MI;University of North Texas, Denton, TX
Venue:
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2004

Citing 9
Cited 1

Building a retargetable local instruction scheduler

Software—Practice & Experience
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Taming the IXP network processor

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Instruction Scheduling for Clustered VLIW DSPs

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Global Register Partitioning

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Register Assignment for Software Pipelining with Partitioned Register Banks

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Resolving Register Bank Conflicts for a Network Processor

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques

Ruler: high-speed packet matching and rewriting on NPUs

Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the ever-increasing pervasiveness of the Internet and its stringent performance requirements, network system designers have begun utilizing specialized chips to increase the performance of network functions. To increase performance, many more advanced functions, such as traffic shaping and policing, are being implemented at the network interface layer to reduce delays that occur when these functions are handled by a general-purpose CPU. While some designs use ASICs to handle network functions, many system designers have moved toward using programmable network processors due to their increased exibility and lower design cost. In this paper, we describe a code generation technique designed for the Agere Payload Plus network processor. This processor utilizes a multi-block pipeline containing a Fast Pattern Processor (FPP) for classification, a Routing Switch Processor (RSP) for traffic management and a third block, the Agere Systems Interface (ASI), which provides additional functionality for performance. This paper focuses on code generation for the clustered VLIW compute engines on the RSP. Currently, due to the real-time nature of the applications run on the APP, the programmer must lay out and partition the application-specific data by hand to get good performance.The major contribution of this paper is to remove the need for hand partitioning for the RSP compute engines. We propose both a greedy code-generation approach that achieves harmonic mean performance equal to code that has been hand partitioned by an application programmer and a genetic algorithm that achieves a harmonic mean speedup of 1.08 over the same hand-partitioned code. Achieving harmonic mean performance that is equal to or better than hand partitioning removes the need to hand code for performance. This allows the programmer to spend more time on algorithm development.