Performance evaluation of memory consistency models for shared-memory multiprocessors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Experimental comparison of memory management policies for NUMA multiprocessors
ACM Transactions on Computer Systems (TOCS)
The Stanford Dash Multiprocessor
Computer
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fast, contention-free combining tree barriers for shared-memory multiprocessors
International Journal of Parallel Programming
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
S-connect: from networks of workstations to supercomputer performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency
Proceedings of the 28th annual international symposium on Microarchitecture
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Using CSIM to model complex systems
WSC '88 Proceedings of the 20th conference on Winter simulation
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Earthquake ground motion modeling on parallel computers
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
Interconnection Networks: An Engineering Approach
Interconnection Networks: An Engineering Approach
MICA: A Memory and Interconnect Simulation Environment for Cache-Based Architectures
SS '00 Proceedings of the 33rd Annual Simulation Symposium
Boosting the Performance of NOW-Based Shared Memory Multiprocessors Through Directory Hints
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Does Multicast Communication Make Sense in Write Invalidation Traffic?
ICPADS '00 Proceedings of the Seventh International Conference on Parallel and Distributed Systems
Optimizing systems by work schedules: (a stochastic approach)
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Hi-index | 0.00 |
With the vast advances of Internet services, large-scale and high-performance servers, such as CC-NUMA multiprocessors, are gaining importance in network computing. In a CC-NUMA multiprocessor, the key component to connect a computing node to the interconnection network is the node controller. Node controllers perform protocol processing to transmit messages with other nodes in the system. As the new generation, CC-NUMA multiprocessors are moving towards application-specific protocol processing, a node controller will require very powerful protocol processors or engines to provide the flexibility of processing different kinds of protocols.In this paper, we study the design of a thread-based node controller, in which protocol engines have a multithreaded architecture. Multi-threading allows protocol processing of different requests to proceed in parallel, whereby reducing blocking and improving response time. Four important design parameters for a multithreaded protocol engine are examined: (1) the number of thread context storages, (2) the number of protocol operation units, (3) the scheduling policy and (4) the thread allocation scheme. From the application-driven simulation on six representative applications, we conclude that the number of threads contexts and protocol operation units have a great impact on the overall system performance. An appropriate thread allocation scheme for invalidation traffic is needed, and prioritizing a thread and scheduling it accordingly are important for the system performance.