The Stanford Dash Multiprocessor
Computer
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Where is time spent in message-passing and shared-memory programs?
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Shared Memory Versus Message Passing for Iterative Solution of Sparse, Irregular Problems
Shared Memory Versus Message Passing for Iterative Solution of Sparse, Irregular Problems
ICS '97 Proceedings of the 11th international conference on Supercomputing
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor
Proceedings of the 25th annual international symposium on Computer architecture
An integer linear programming approach for optimizing cache locality
ICS '99 Proceedings of the 13th international conference on Supercomputing
Performance Metrics for Embedded Parallel Pipelines
IEEE Transactions on Parallel and Distributed Systems
Static and Dynamic Locality Optimizations Using Integer Linear Programming
IEEE Transactions on Parallel and Distributed Systems
Quasidynamic Layout Optimizations for Improving Data Locality
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 4.10 |
The architecture of parallel machines influences the structure of parallel programs, and vice versa. This symbiotic cycle has been in motion for at least a decade. An important result of research on shared memory applications is the two sets of benchmarks, Splash and NAS, that have driven much research into shared memory architectures and cache-coherence protocols. At the same time, succeeding generations of shared memory multiprocessors (Stanford Dash, MIT Alewife, and Wisconsin Typhoon) have been used to characterize the behavior of shared memory applications. New applications and architectural mechanisms are now emerging. In particular, fine-grain applications are an important emerging class that warrants further study. These applications have long been thought to favor message-passing over shared memory architectures because of their frequent communications and sensitivity to memory latency. The authors present the performance of 14 applications on the Alewife machine, including both coarse- and fine-grain applications. Not surprisingly, Alewife's mechanisms support the good performance of traditional coarse-grain applications from the Splash and NAS benchmark suites. But the authors also show that Alewife provides an excellent communication mechanism for fine-grain applications. The results confirm that hardware support for limited sharing is adequate for a broad range of applications, even on large numbers of processors. Local cache-miss behavior turns out to be important on multiprocessors with low remote miss latencies. The results show that low-latency miss-handling mechanisms for both local and remote accesses in Alewife make fine-grain applications viable candidates for shared memory parallel processing.