Solution of a problem in concurrent programming control
Communications of the ACM
ACM SIGARCH Computer Architecture News
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
The NYU Ultracomputer Designing an MIMD Shared Memory Parallel Computer
IEEE Transactions on Computers
Analysis and Simulation of Buffered Delta Networks
IEEE Transactions on Computers
On a Class of Multistage Interconnection Networks
IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor
IEEE Transactions on Computers
The Performance of Multistage Interconnection Networks for Multiprocessors
IEEE Transactions on Computers
The fuzzy barrier: a mechanism for high speed synchronization of processors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Efficient synchronization primitives for large-scale cache-coherent multiprocessors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Using feedback to control tree saturation in multistage interconnection networks
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Adaptive backoff synchronization techniques
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Parallel Quicksort Using Fetch-And-Add
IEEE Transactions on Computers
Cache considerations for multiprocessor programmers
Communications of the ACM
Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Synchronization without contention
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Fast barrier synchronization hardware
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Traffic studies of unbuffered Delta networks
IBM Journal of Research and Development
An argument against scalable cache coherency
ACM SIGARCH Computer Architecture News
Alleviation of tree saturation in multistage interconnection networks
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Hot-Spot Contention in Binary Hypercube Networks
IEEE Transactions on Computers
Low contention load balancing on large-scale multiprocessors
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
An effective synchronization network for hot-spot accesses
ACM Transactions on Computer Systems (TOCS)
A Cost-Effective Combining Structure for Large-Scale Shared-Memory Multiprocessors
IEEE Transactions on Computers
Waiting algorithms for synchronization in large-scale multiprocessors
ACM Transactions on Computer Systems (TOCS)
Fast, scalable synchronization with minimal hardware support
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
Diffracting trees (preliminary version)
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Request Combining in Multiprocessors with Arbitrary Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Using a Multipath Network for Reducing the Effects of Hot Spots
IEEE Transactions on Parallel and Distributed Systems
A Hierarchical Task Queue Organization for Shared-Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Prevention of Congestion in Packet-Switched Multistage Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Distributed Hardwired Barrier Synchronization for Scalable Multiprocessor Clusters
IEEE Transactions on Parallel and Distributed Systems
ACM Transactions on Computer Systems (TOCS)
The communication requirements of mutual exclusion
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Design and Performance Analysis of Load-Distributing Fault-Tolerant Network
IEEE Transactions on Computers
ACM Transactions on Computer Systems (TOCS)
Scheduler-conscious synchronization
ACM Transactions on Computer Systems (TOCS)
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Distributed shared memory systems with improved barrier synchronization and data transfer
ICS '97 Proceedings of the 11th international conference on Supercomputing
Combining funnels: a new twist on an old tale…
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Designing Tree-Based Barrier Synchronization on 2D Mesh Networks
IEEE Transactions on Parallel and Distributed Systems
Scalable concurrent priority queue algorithms
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Restricted Fetch and Φ operations for parallel processing
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Performance analysis of buffered banyan networks under nonuniform traffic
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Synchronization with multiprocessor caches
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
The Postal Network: A Recursive Network for Parameterized Communication Model
The Journal of Supercomputing
Reducing Hot-Spot Contention in Shared-Memory Multiprocessor Systems
IEEE Concurrency
Performance Evaluation of Hierarchical Ring-Based Shared Memory Multiprocessors
IEEE Transactions on Computers
The Use of Feedback in Multiprocessors and Its Application to Tree Saturation Control
IEEE Transactions on Parallel and Distributed Systems
Evaluation of Two Traffic Distribution Strategies for a Dual-Network Multiprocessor System
IEEE Transactions on Parallel and Distributed Systems
Performance of Pruning-Cache Directories for Large-Scale Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Mesh Interconnection Networks with Deterministic Routing
IEEE Transactions on Parallel and Distributed Systems
Hierarchical Compilation of Macro Dataflow Graphs for Multiprocessors with Local Memory
IEEE Transactions on Parallel and Distributed Systems
Analysis of Finite Buffered Multistage Combining Networks
IEEE Transactions on Parallel and Distributed Systems
Optimal Hot Spot Allocation on Meshes for Large-Scale Data-Parallel Algorithms
IEEE Transactions on Parallel and Distributed Systems
A Circular List-Based Mutual Exclusion Scheme for Large Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Simulation Data Structures for Parallel Resource Management
IEEE Transactions on Software Engineering
Integrated Network Barriers for D-Dimensional Meshes
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Thread prioritization: a thread scheduling mechanism for multiple-context parallel processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Support for extensibility and site autonomy in the Legion grid system object model
Journal of Parallel and Distributed Computing - Special issue on computational grids
Switch fabric architecture analysis for a scalable bi-directionally reconfigurable IP router
Journal of Systems Architecture: the EUROMICRO Journal
Distributed Computing
Fast synchronization on shared-memory multiprocessors: An architectural approach
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Switch fabric design for high performance IP routers: a survey
Journal of Systems Architecture: the EUROMICRO Journal
Destination-Based HoL Blocking Elimination
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Circulating shared-registers for multiprocessor systems
Journal of Systems Architecture: the EUROMICRO Journal
Self-tuning reactive diffracting trees
Journal of Parallel and Distributed Computing
Scalable barrier synchronisation for large-scale shared-memory multiprocessors
International Journal of High Performance Computing and Networking
Avoiding unbounded priority inversion in barrier protocols using gang priority management
Proceedings of the 7th International Workshop on Java Technologies for Real-Time and Embedded Systems
Low-contention data structures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Flat combining and the synchronization-parallelism tradeoff
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Fast barrier synchronization for InfiniBand™
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A highly-efficient wait-free universal construction
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Supporting OpenMP on a multi-cluster embedded MPSoC
Microprocessors & Microsystems
Dynamic evolution of congestion trees: analysis and impact on switch architecture
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Constructing shared objects that are both robust and high-throughput
DISC'06 Proceedings of the 20th international conference on Distributed Computing
A methodology for creating fast wait-free data structures
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Revisiting the combining synchronization technique
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Low-contention data structures
Journal of Parallel and Distributed Computing
A dynamic elimination-combining stack algorithm
OPODIS'11 Proceedings of the 15th international conference on Principles of Distributed Systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
NUMA-aware shared-memory collective communication for MPI
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Location-aware cache management for many-core processors with deep cache hierarchy
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A new proposal to deal with congestion in InfiniBand-based fat-trees
Journal of Parallel and Distributed Computing
On Automation in the Verification of Software Barriers: Experience Report
Journal of Automated Reasoning
Hi-index | 15.02 |
When a large number of processors try to access a common variable, referred to as hot-spot accesses in [6], not only can the resulting memory contention seriously degrade performance, but it can also cause tree saturation in the interconnection network which blocks both hot and regular requests alike. It is shown in [6] that even if only a small percentage of all requests are to a hot-spot, these requests can cause very serious performances problems, and networks that do the necessary combining of requests are suggested to keep the interconnection network and memory contention from becoming a bottleneck.