Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs

Authors:
Babak Falsafi;David A. Wood
Affiliations:
Computer Architecture Laboratory, Carnegie Mellon University, Hamerschlag Hall A305, Pittsburgh, PA 15213, USA;Computer Sciences Department, University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI 53706, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2005

Citing 19
Cited 0

The CM-5 Connection Machine: a scalable supercomputer

Communications of the ACM
EEL: machine-independent executable editing

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
CRL: high-performance all-software distributed shared memory

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Decoupled hardware support for distributed shared memory

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Coherence controller architectures for SMP-based CC-NUMA multiprocessors

Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Cost-Effective Parallel Computing

Computer
A Case for NOW (Networks of Workstations)

IEEE Micro
START-NG: Delivering Seamless Parallel Computing

Euro-Par '95 Proceedings of the First International Euro-Par Conference on Parallel Processing
Performance Evaluation of a Cluster-Based Multiprocessor Built from ATM Switches and Bus-Based Multiprocessor Servers

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Scheduling Communication on an SMP Node Parallel Machine

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Message Proxies for Efficient, Protected Communication on SMP Clusters

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Parallel Dispatch Queue: A Queue-Based Programming Abstraction To Parallelize Fine-Grain Communication Protocols

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors

The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors
Mechanisms for distributed shared memory

Mechanisms for distributed shared memory
Fine-grain protocol execution mechanisms and scheduling policies on smp clusters

Fine-grain protocol execution mechanisms and scheduling policies on smp clusters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed-memory parallel computers and networks of workstations (NOWs) both rely on efficient communication over increasingly high-speed networks. Software communication protocols are often the performance bottleneck. Several current and proposed parallel systems address this problem by dedicating one general-purpose processor in a symmetric multiprocessor (SMP) node specifically for protocol processing. This protocol processing convention reduces communication latency and increases effective bandwidth, but also reduces the peak performance since the dedicated processor no longer performs computation. In this paper, we study a parallel machine with SMP nodes and compare two protocol processing policies: the Fixed policy, which uses a dedicated protocol processor; and the Floating policy, where all processors perform both computation and protocol processing. The results from synthetic microbenchmarks and five macrobenchmarks show that: (i) a dedicated protocol processor benefits light-weight protocols much more than heavy-weight protocols, (ii) a dedicated protocol processor is generally advantageous when there are four or more processors per node, (iii) multiprocessor node performance is not as sensitive to interrupt overhead as uniprocessor node because a message arrival is likely to find an idle processor on a multiprocessor node, thereby eliminating interrupts, (iv) the system with the lowest cost-performance will include a dedicated protocol processor when interrupt overheads are much higher than protocol weight-as in light-weight protocols.