The Thread-Based Protocol Engines for CC-NUMA Multiprocessors

Authors:
Hung-Chang Hsiao;Chung-Ta King
Affiliations:
-;-
Venue:
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Year:
2000

Citing 27
Cited 1

Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Experimental comparison of memory management policies for NUMA multiprocessors

ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor

Computer
Comparative performance evaluation of cache-coherent NUMA and COMA architectures

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fast, contention-free combining tree barriers for shared-memory multiprocessors

International Journal of Parallel Programming
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
S-connect: from networks of workstations to supercomputer performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency

Proceedings of the 28th annual international symposium on Microarchitecture
Decoupled hardware support for distributed shared memory

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB

Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Proceedings of the 25th annual international symposium on Computer architecture
Using CSIM to model complex systems

WSC '88 Proceedings of the 20th conference on Winter simulation
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Earthquake ground motion modeling on parallel computers

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
MICA: A Memory and Interconnect Simulation Environment for Cache-Based Architectures

SS '00 Proceedings of the 33rd Annual Simulation Symposium
Boosting the Performance of NOW-Based Shared Memory Multiprocessors Through Directory Hints

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Does Multicast Communication Make Sense in Write Invalidation Traffic?

ICPADS '00 Proceedings of the Seventh International Conference on Parallel and Distributed Systems

Optimizing systems by work schedules: (a stochastic approach)

WOSP '02 Proceedings of the 3rd international workshop on Software and performance

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the vast advances of Internet services, large-scale and high-performance servers, such as CC-NUMA multiprocessors, are gaining importance in network computing. In a CC-NUMA multiprocessor, the key component to connect a computing node to the interconnection network is the node controller. Node controllers perform protocol processing to transmit messages with other nodes in the system. As the new generation, CC-NUMA multiprocessors are moving towards application-specific protocol processing, a node controller will require very powerful protocol processors or engines to provide the flexibility of processing different kinds of protocols.In this paper, we study the design of a thread-based node controller, in which protocol engines have a multithreaded architecture. Multi-threading allows protocol processing of different requests to proceed in parallel, whereby reducing blocking and improving response time. Four important design parameters for a multithreaded protocol engine are examined: (1) the number of thread context storages, (2) the number of protocol operation units, (3) the scheduling policy and (4) the thread allocation scheme. From the application-driven simulation on six representative applications, we conclude that the number of threads contexts and protocol operation units have a great impact on the overall system performance. An appropriate thread allocation scheme for invalidation traffic is needed, and prioritizing a thread and scheduling it accordingly are important for the system performance.