Request Combining in Multiprocessors with Arbitrary Interconnection Networks

Authors:
Alvin R. Lebeck;Gurindar S. Sohi
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1994

Citing 22
Cited 1

Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Hierarchical cache/bus architecture for shared memory multiprocessors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors

IEEE Transactions on Computers
Efficient synchronization of multiprocessors with shared memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
A fetch-and-op implementation for parallel computers

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Efficient synchronization primitives for large-scale cache-coherent multiprocessors

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Scans as Primitive Parallel Operations

IEEE Transactions on Computers
Software combining algorithms for distributing hot-spot addressing

Journal of Parallel and Distributed Computing
Process coordination with fetch-and-increment

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Comparative performance evaluation of cache-coherent NUMA and COMA architectures

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
DDM: A Cache-Only Memory Architecture

Computer
The network architecture of the Connection Machine CM-5 (extended abstract)

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
An effective synchronization network for hot-spot accesses

ACM Transactions on Computer Systems (TOCS)
Highly parallel computing

Highly parallel computing
Extending the scalable coherent interface for large-scale shared-memory multiprocessors

Extending the scalable coherent interface for large-scale shared-memory multiprocessors
Toward the design of large-scale shared-memory multiprocessors

Toward the design of large-scale shared-memory multiprocessors
Restricted Fetch and Φ operations for parallel processing

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Parallel Prefix Computation

Journal of the ACM (JACM)
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)
A critique of multiprocessing von Neumann style

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A critique of multiprocessing von Neumann style

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture

High-bandwidth address translation for multiple-issue processors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several techniques have been proposed to allow parallel access to a shared memorylocation by combining requests. They have one or more of the following attributes:requirements for a priori knowledge of the request to combine, restrictions on the routingof messages in the network, or the use of sophisticated interconnection network nodes.We present a new method of combining requests that does not have the aboverequirements. We obtain this new method for request combining by developing aclassification scheme for the existing methods of request combining. This classificationscheme is facilitated by separating the request combining process into a two partoperation: determining the combining set, which is the set of requests that participate ina combined access; and distributing the results of the combined access to the membersof the combining set. The classification of combining strategies is based upon whichsystem component, processor elements, or interconnection network performs each ofthese tasks. Our approach, which uses the interconnection network to establish thecombining set and the processor elements to distribute the results, lies in an unexploredarea of the design space. We also present simulation results to assess the benefits of theproposed approach.