CableS: Thread Control and Memory System Extensions for Shared Virtual Memory Clusters

Authors:
Peter Jamieson;Angelos Bilas
Affiliations:
-;-
Venue:
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Year:
2001

Citing 17
Cited 2

FFTs in external or hierarchical memory

The Journal of Supercomputing
Finding and exploiting parallelism in an ocean simulation program: experience, results, and implications

Journal of Parallel and Distributed Computing
The use of message-based multicomputer components to construct gigabit networks

ACM SIGCOMM Computer Communication Review
The performance advantages of integrating block data transfer in cache-coherent multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Understanding application performance on shared virtual memory systems

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Performance evaluation of two home-based lazy release consistency protocols for shared virtual memory systems

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Application restructuring and performance portability on shared virtual memory and hardware-coherent multiprocessors

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cashmere-2L: software coherent shared memory on a clustered remote-write network

Proceedings of the sixteenth ACM symposium on Operating systems principles
Using network interface support to avoid asynchronous protocol processing in shared virtual memory systems

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Application scaling under shared virtual memory on a cluster of SMPs

ICS '99 Proceedings of the 13th international conference on Supercomputing
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A high performance cluster JVM presenting a pure single system image

Proceedings of the ACM 2000 conference on Java Grande
OpenMP on networks of workstations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Overview of memory channel network for PCI

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Improving Release-Consistent Shared Virtual Memory using Automatic Update

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
TreadMarks: distributed shared memory on standard workstations and operating systems

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference

Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Performance portability on EARTH: a case study across several parallel architectures

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clusters of high-end workstations and PCs are currently used in many application domains to perform large-scale computations or as scalable servers for I/O bound tasks. Although clusters have many advantages, their applicability in new areas and especially in areas of commercial applications has been limited. One of the main reasons for this is the fact that clusters do not provide a single system image and thus are hard to program. In this work we address this problem by providing a single cluster image with respect to thread and memory management to programmers. The main limitation of our system is that it does not yet provide file system and networking support across cluster nodes. We implement our system on a 16-processor cluster interconnected with a low-latency, high-bandwidth system area network. We demonstrate the versatility of our system with a wide range of applications. We show that clusters can be used to support applications that have been written for more expensive tightly-coupled systems, with very little effort on the programmer side. Finally, we show that the overhead introduced by the extra functionality of CableS affects the parallel section of applications that have been tuned for the shared memory abstraction only in cases where the data placement policy of the system results in improper placement due to operating system limitations in virtual memory mappings granularity.