Balancing register allocation across threads for a multithreaded network processor

Authors:
Xiaotong Zhuang;Santosh Pande
Affiliations:
Georgia Institute of Technology, Atlanta, GA;Georgia Institute of Technology, Atlanta, GA
Venue:
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Year:
2004

Citing 22
Cited 6

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Register allocation via clique separators

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Introduction to algorithms

Introduction to algorithms
The priority-based coloring approach to register allocation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Register allocation via hierarchical graph coloring

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Iterated register coalescing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Whole-program optimization for time and space efficient threads

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Advanced compiler design and implementation

Advanced compiler design and implementation
C Compiler Design for an Industrial Network Processor

OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Network processing in content inspection applications

Proceedings of the 14th international symposium on Systems synthesis
Building a robust software-based router using network processors

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Inter-task register-allocation for static operating systems

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Experience with a retargetable compiler for a commercial network processor

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
WRAPS Scheduling and Its Efficient Implementation on Network Processors

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Effective Compilation Support for Variable Instruction Set Architecture

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Taming the IXP network processor

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
A pipelined memory architecture for high throughput network processors

Proceedings of the 30th annual international symposium on Computer architecture
CommBench-a telecommunications benchmark for network processors

ISPASS '00 Proceedings of the 2000 IEEE International Symposium on Performance Analysis of Systems and Software

Shangri-La: achieving high performance from compiled network applications while enabling ease of programming

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
An interprocedural code optimization technique for network processors using hardware multi-threading support

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Effective thread management on network processors with compiler analysis

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Balanced bipartite graph based register allocation for network processors in mobile and wireless networks

Mobile Information Systems - Mobile and Wireless Networks
Compiler assisted dynamic management of registers for network processors

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Compiler-Supported Thread Management for Multithreaded Network Processors

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern network processors employ multi-threading to allow concurrency amongst multiple packet processing tasks. We studied the properties of applications running on the network processors and observed that their imbalanced register requirements across different threads at different program points could lead to poor performance. Many times application needs demand some threads to be more performance critical than others and thus by controlling the register allocation across threads one could impact the performance of the threads and get the desired performance properties for concurrent threads. This prompts our work.Our register allocator aims to distribute available registers to different threads according to their needs. The compiler analyzes the register needs of each thread both at the point of a context switch as well as internally. Compiler then designates some registers as shared and some as private to each thread. Shared registers are allocated across all threads explicitly by the compiler. Values that are live across a context switch can not be kept in shared registers due to safety reasons; thus, only those live ranges that are internal to the context switch can be safely allocated to shared registers. Spill can cause a context switch. and thus, the problems of context switch and allocation are closely coupled and we propose a solution to this problem. The proposed interference graphs (GIG,BIG,IIG) distinguish variables that must use a thread's private registers from those that can use shared registers. We first estimate the register requirement bounds, then reduce from the upper bound gradually to achieve a good register balance among threads. To reduce the register needs, move insertions are inserted at program points that split the live ranges or the nodes on the interference graph. We show that the lower bound is reachable via live range splitting and is adequate for our benchmark programs for simultaneously assigning them on different threads. As our objective, the number of move instructions is minimized.Empirical results show that the compiler is able to effectively control the register allocation across threads by maximizing the number of shared registers. Speed-up for performance critical threads ranges from 18 to 24% whereas degradation for performance of non-critical threads ranges only from 1 to 4%.