Expected Length of the Longest Probe Sequence in Hash Code Searching
Journal of the ACM (JACM)
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Synchronization with eventcounts and sequencers
Communications of the ACM
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Routing, merging and sorting on parallel models of computation
STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
Banyan networks for partitioning multiprocessor systems
ISCA '73 Proceedings of the 1st annual symposium on Computer architecture
A large scale, homogeneous, fully distributed parallel machine, I
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Coordinating parallel processors: a partial unification
ACM SIGARCH Computer Architecture News
Software structures for ultraparallel computing
Software structures for ultraparallel computing
Upper and lower bounds on the performance of parallel algorithms
Upper and lower bounds on the performance of parallel algorithms
Computer
Analysis and Simulation of Buffered Delta Networks
IEEE Transactions on Computers
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs
IEEE Transactions on Computers
Access and Alignment of Data in an Array Processor
IEEE Transactions on Computers
Cm*: a modular, multi-microprocessor
AFIPS '77 Proceedings of the June 13-16, 1977, national computer conference
On input/output speedup in tightly coupled multiprocessors
IEEE Transactions on Computers - The MIT Press scientific computation series
Performance of unbuffered shuffle-exchange networks
IEEE Transactions on Computers - The MIT Press scientific computation series
Path hierarchies in interconnection networks
IBM Journal of Research and Development
Distributing Hot-Spot Addressing in Large-Scale Multiprocessors
IEEE Transactions on Computers
IEEE Transactions on Computers
Traffic-Specific Interconnection Networks for Multicomputers
IEEE Transactions on Computers
New Connectivity and MSF Algorithms for Shuffle-Exchange Network and PRAM
IEEE Transactions on Computers
A Partitioning Strategy for Nonuniform Problems on Multiprocessors
IEEE Transactions on Computers
IEEE Transactions on Computers
Performance analysis of the FFT algorithm on a shared-memory parallel architecture
IBM Journal of Research and Development
Applications considerations in the system design of highly concurrent multiprocessors
IEEE Transactions on Computers
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
Compiler algorithms for synchronization
IEEE Transactions on Computers
Fault-tolerant routing in MIN-based supercomputers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Implementing the Data Diffusion Machine Using Crossbar Routers
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Kiloprocessor Extensions to SCI
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Reliable Hardware Barrier Synchronization Scheme
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
ClusterNet: An Object-Oriented Cluster Network
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Fusion of Concurrent Invocations of Exclusive Methods
PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Performance of MP3D on the SB-PRAM Prototype (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Highly Concurrent Locking in Shared Memory Database Systems
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
The Stereo Correspondence Problem on a Ring-based Network
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Extracting Parallelism in Nested Loops
COMPSAC '96 Proceedings of the 20th Conference on Computer Software and Applications
Gracefully Degrading Systems Using the Bulk-Synchronous Parallel Model with Randomised Shared Memory
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Fast synchronization on shared-memory multiprocessors: An architectural approach
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Designing irregular parallel algorithms with mutual exclusion and lock-free protocols
Journal of Parallel and Distributed Computing
Multistage Interconnection Networks with Multiple Outlets
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Performance and Reliability of the Multistage Bus Network
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
The Performance of Multistage Interconnection Networks for Multiprocessors
IEEE Transactions on Computers
Scaling performance of interior-point method on large-scale chip multiprocessor system
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Combinable memory-block transactions
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
An area-efficient high-throughput hybrid interconnection network for single-chip parallel processing
Proceedings of the 45th annual Design Automation Conference
Mesh-of-trees and alternative interconnection networks for single-chip parallelism
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Paper: Deadlock detection without wait-for graphs
Parallel Computing
Using simple abstraction to reinvent computing for parallelism
Communications of the ACM
Database Applications of the FETCH-AND-ADD Instruction
IEEE Transactions on Computers
Lock-Free parallel algorithms: an experimental study
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Hardware support for OpenMP collective operations
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Towards the ideal on-chip fabric for 1-to-many and many-to-1 communication
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Interconnection network front-end controller combining to reduce hot spots effects
Computer Communications
An optimal parallel prefix-sums algorithm on the memory machine models for GPUs
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Reducing contention through priority updates
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 15.03 |
We present the design for the NYU Ultracomputer, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements. This machine uses an enhanced message switching network with the geometry of an Omega-network to approximate the ideal behavior of Schwartz's paracomputer model of computation and to implement efficiently the important fetch-and-add synchronization primitive. We outine the hardware that would be required to build a 4096 processor system using 1990's technology. We also discuss system software issues, and present analytic studies of the network performance. Finally, we include a sample of our effort to implement and simulate parallel variants of important scientific p`rograms.