Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

Authors:
P.-C. Yew;N.-F. Tzeng;D. H. Lawrie
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
1987

Citing 8
Cited 83

Solution of a problem in concurrent programming control

Communications of the ACM
On “hot spot” contention

ACM SIGARCH Computer Architecture News
A large scale, homogeneous, fully distributed parallel machine, I

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer

IEEE Transactions on Computers
Analysis and Simulation of Buffered Delta Networks

IEEE Transactions on Computers
On a Class of Multistage Interconnection Networks

IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor

IEEE Transactions on Computers
The Performance of Multistage Interconnection Networks for Multiprocessors

IEEE Transactions on Computers

The fuzzy barrier: a mechanism for high speed synchronization of processors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Using feedback to control tree saturation in multistage interconnection networks

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Adaptive backoff synchronization techniques

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Parallel Quicksort Using Fetch-And-Add

IEEE Transactions on Computers
Cache considerations for multiprocessor programmers

Communications of the ACM
Directory-Based Cache Coherence in Large-Scale Multiprocessors

Computer
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Synchronization without contention

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Fast barrier synchronization hardware

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Traffic studies of unbuffered Delta networks

IBM Journal of Research and Development
An argument against scalable cache coherency

ACM SIGARCH Computer Architecture News
Alleviation of tree saturation in multistage interconnection networks

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Hot-Spot Contention in Binary Hypercube Networks

IEEE Transactions on Computers
Low contention load balancing on large-scale multiprocessors

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Cache Invalidation Patterns in Shared-Memory Multiprocessors

IEEE Transactions on Computers
An effective synchronization network for hot-spot accesses

ACM Transactions on Computer Systems (TOCS)
A Cost-Effective Combining Structure for Large-Scale Shared-Memory Multiprocessors

IEEE Transactions on Computers
Waiting algorithms for synchronization in large-scale multiprocessors

ACM Transactions on Computer Systems (TOCS)
Fast, scalable synchronization with minimal hardware support

PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Diffracting trees (preliminary version)

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Request Combining in Multiprocessors with Arbitrary Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems
Measurement-based characterization of global memory and network contention, operating system and parallelization overheads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reactive synchronization algorithms for multiprocessors

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Using a Multipath Network for Reducing the Effects of Hot Spots

IEEE Transactions on Parallel and Distributed Systems
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Prevention of Congestion in Packet-Switched Multistage Interconnection Networks

IEEE Transactions on Parallel and Distributed Systems
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters

IEEE Transactions on Parallel and Distributed Systems
Scalable concurrent counting

ACM Transactions on Computer Systems (TOCS)
The communication requirements of mutual exclusion

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Design and Performance Analysis of Load-Distributing Fault-Tolerant Network

IEEE Transactions on Computers
Diffracting trees

ACM Transactions on Computer Systems (TOCS)
Scheduler-conscious synchronization

ACM Transactions on Computer Systems (TOCS)
Reactive diffracting trees

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Distributed shared memory systems with improved barrier synchronization and data transfer

ICS '97 Proceedings of the 11th international conference on Supercomputing
Combining funnels: a new twist on an old tale…

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks

IEEE Transactions on Parallel and Distributed Systems
Scalable concurrent priority queue algorithms

Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Restricted Fetch and Φ operations for parallel processing

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Performance analysis of buffered banyan networks under nonuniform traffic

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Synchronization with multiprocessor caches

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The Postal Network: A Recursive Network for Parameterized Communication Model

The Journal of Supercomputing
Reducing Hot-Spot Contention in Shared-Memory Multiprocessor Systems

IEEE Concurrency
The Message-Driven Processor: A Multicomputer Processing Node with Efficient Mechanisms

IEEE Micro
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors

IEEE Transactions on Computers
The Use of Feedback in Multiprocessors and Its Application to Tree Saturation Control

IEEE Transactions on Parallel and Distributed Systems
Evaluation of Two Traffic Distribution Strategies for a Dual-Network Multiprocessor System

IEEE Transactions on Parallel and Distributed Systems
Performance of Pruning-Cache Directories for Large-Scale Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing

IEEE Transactions on Parallel and Distributed Systems
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory

IEEE Transactions on Parallel and Distributed Systems
Analysis of Finite Buffered Multistage Combining Networks

IEEE Transactions on Parallel and Distributed Systems
Optimal Hot Spot Allocation on Meshes for Large-Scale Data-Parallel Algorithms

IEEE Transactions on Parallel and Distributed Systems
A Circular List-Based Mutual Exclusion Scheme for Large Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Simulation Data Structures for Parallel Resource Management

IEEE Transactions on Software Engineering
Integrated Network Barriers for D-Dimensional Meshes

PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
The core Legion object model

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Support for extensibility and site autonomy in the Legion grid system object model

Journal of Parallel and Distributed Computing - Special issue on computational grids
Switch fabric architecture analysis for a scalable bi-directionally reconfigurable IP router

Journal of Systems Architecture: the EUROMICRO Journal
Read-modify-write networks

Distributed Computing
Fast synchronization on shared-memory multiprocessors: An architectural approach

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Switch fabric design for high performance IP routers: a survey

Journal of Systems Architecture: the EUROMICRO Journal
Destination-Based HoL Blocking Elimination

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Circulating shared-registers for multiprocessor systems

Journal of Systems Architecture: the EUROMICRO Journal
Self-tuning reactive diffracting trees

Journal of Parallel and Distributed Computing
Scalable barrier synchronisation for large-scale shared-memory multiprocessors

International Journal of High Performance Computing and Networking
Avoiding unbounded priority inversion in barrier protocols using gang priority management

Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems
Low-contention data structures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Flat combining and the synchronization-parallelism tradeoff

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Fast barrier synchronization for InfiniBand™

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A highly-efficient wait-free universal construction

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Supporting OpenMP on a multi-cluster embedded MPSoC

Microprocessors & Microsystems
Dynamic evolution of congestion trees: analysis and impact on switch architecture

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Constructing shared objects that are both robust and high-throughput

DISC'06 Proceedings of the 20th international conference on Distributed Computing
A methodology for creating fast wait-free data structures

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Revisiting the combining synchronization technique

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Low-contention data structures

Journal of Parallel and Distributed Computing
A dynamic elimination-combining stack algorithm

OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
VRSync: characterizing and eliminating synchronization-induced voltage emergencies in many-core processors

Proceedings of the 39th Annual International Symposium on Computer Architecture
NUMA-aware shared-memory collective communication for MPI

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Location-aware cache management for many-core processors with deep cache hierarchy

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A new proposal to deal with congestion in InfiniBand-based fat-trees

Journal of Parallel and Distributed Computing
On Automation in the Verification of Software Barriers: Experience Report

Journal of Automated Reasoning

Quantified Score

Hi-index	15.02

Visualization

Abstract

When a large number of processors try to access a common variable, referred to as hot-spot accesses in [6], not only can the resulting memory contention seriously degrade performance, but it can also cause tree saturation in the interconnection network which blocks both hot and regular requests alike. It is shown in [6] that even if only a small percentage of all requests are to a hot-spot, these requests can cause very serious performances problems, and networks that do the necessary combining of requests are suggested to keep the interconnection network and memory contention from becoming a bottleneck.