A practical way to extend shared memory support beyond a motherboard at low cost

Authors:
Héctor Montaner;Federico Silla;José Duato
Affiliations:
Universitat Politècnica de València, València, Spain;Universitat Politècnica de València, València, Spain;Universitat Politècnica de València, València, Spain
Venue:
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Year:
2010

Citing 18
Cited 0

Implementing global memory management in a workstation cluster

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Availability and utility of idle memory in workstation clusters

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Simics: A Full System Simulation Platform

Computer
A Case for NOW (Networks of Workstations)

IEEE Micro
Main Memory Database Systems: An Overview

IEEE Transactions on Knowledge and Data Engineering
The AMD Opteron Processor for Multiprocessor Servers

IEEE Micro
Using Available Remote Memory Dynamically for Parallel Data Mining Application on ATM-Connected PC Cluster

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Parallel Network RAM: Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Scientific data management in the coming decade

ACM SIGMOD Record
Scalable Cache Miss Handling for High Memory-Level Parallelism

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
The AMD Opteron Northbridge Architecture

IEEE Micro
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
An open-source HyperTransport core

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
VELO: A Novel Communication Engine for Ultra-Low Latency Message Transfers

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
vNUMA: a virtual shared-memory multiprocessor

USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Improvements in parallel computing hardware usually involve increments in the number of available resources for a given application such as the number of computing cores and the amount of memory. In the case of shared-memory computers, the increase in computing resources and available memory is usually constrained by the coherency protocol, whose overhead rises with system size, limiting the scalability of the final system. In this paper we propose an efficient and cost-effective way to increase the memory available for a given application by leveraging free memory in other computers in the cluster. Our proposal is based on the observation that many applications benefit from having more memory resources but do not require more computing cores, thus reducing the requirements for cache coherency and allowing a simpler implementation and better scalability. Simulation results show that, when additional mechanisms intended to hide remote memory latency are used, execution time of applications that use our proposal is similar to the time required to execute them in a computer populated with enough local memory, thus validating the feasibility of our proposal. We are currently building a prototype that implements our ideas.