Introducing mNUMA: an extended PGAS architecture

Authors:
Megan Vance;Peter M. Kogge
Affiliations:
University of Notre Dame;University of Notre Dame
Venue:
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Year:
2010

Citing 14
Cited 0

Optimal parallel algorithms for constructing and maintaining a balanced m-way search tree

International Journal of Parallel Programming
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
A Case for Intelligent RAM

IEEE Micro
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
Session Guarantees for Weakly Consistent Replicated Data

PDIS '94 Proceedings of the Third International Conference on Parallel and Distributed Information Systems
Developing a Communication Intensive Application on the EARTH Multithreaded Architecture (Distinguished Paper)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
From Causal Consistency to Sequential Consistency in Shared Memory Systems

Proceedings of the 15th Conference on Foundations of Software Technology and Theoretical Computer Science
A performance analysis of the Berkeley UPC compiler

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Software Distributed Shared Memory: a VIA-based implementation and comparison of sequential consistency with home-based lazy release consistency: Research Articles

Software—Practice & Experience
Fast shared-memory algorithms for computing the minimum spanning forest of sparse graphs

Journal of Parallel and Distributed Computing
Exploration of distributed shared memory architectures for NoC-based multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
PVTOL: Providing Productivity, Performance and Portability to DoD Signal Processing Applications on Multicore Processors

HPCMP-UGC '08 Proceedings of the 2008 DoD HPCMP Users Group Conference
Specifying and dynamically verifying address translation-aware memory consistency

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Fast PGAS connected components algorithms

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe design details of a Light Weight Processing migration-NUMA architecture, a novel high performance system design that provides hardware support for a partitioned global address space, migrating subjects, and word level synchronization primitives. Using the architectural definition, combinations of structures are shown to work together to carry out basic actions such as address translation, migration, in-memory synchronization, and work management. We present results from simulation of microkernels showing that LWP-mNUMA compensates for latency with far greater memory access concurrency than possible on a conventional systems. In particular, several microkernels model tough, irregular access patterns that have limited speedups -- in certain problem areas -- to dozens of conventional processors. On these, results show speedup increasing up to 1024 multicore mNUMA processing nodes, running over 1 million threadlets.