The Illinois Aggressive Coma Multiprocessor project (I-ACOMA)

Authors:
J. Torrellas;D. Padua
Affiliations:
-;-
Venue:
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Year:
1996

Citing 22
Cited 4

A Scheme to Enforce Data Dependence on Large Multiprocessor Systems

IEEE Transactions on Software Engineering
Compiler algorithms for synchronization

IEEE Transactions on Computers
An approach to synchronization for parallel computing

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Comparative performance evaluation of cache-coherent NUMA and COMA architectures

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Cache Invalidation Patterns in Shared-Memory Multiprocessors

IEEE Transactions on Computers
DDM: A Cache-Only Memory Architecture

Computer
Improving the performance of runtime parallelization

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The privatizing DOALL test: a run-time technique for DOALL loop identification and array privatization

ICS '94 Proceedings of the 8th international conference on Supercomputing
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Speeding up irregular applications in shared-memory multiprocessors: memory binding and group prefetching

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Data forwarding in scalable shared-memory multiprocessors

ICS '95 Proceedings of the 9th international conference on Supercomputing
Missing the memory wall: the case for processor/memory integration

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Instruction prefetching of systems codes with layout optimized for reduced cache misses

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An efficient algorithm for the run-time parallelization of DOACROSS loops

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
The Augmint multiprocessor simulation toolkit for Intel x86 architectures

ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
An argument for simple COMA

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Bus-based COMA-reducing traffic in shared-bus multiprocessors

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
EXECUBE-A New Architecture for Scaleable MPPs

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 01
Data Prefetching and Data Forwarding in Shared Memory Multiprocessors

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02

Hardware Versus Software Implementation of COMA

ICPP '97 Proceedings of the international Conference on Parallel Processing
The NUMAchine Multiprocessor

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Modeling and evaluating the time overhead induced by BER in COMA multiprocessors

Journal of Systems Architecture: the EUROMICRO Journal
YAARC: yet another approach to further reducing the rate of conflict misses

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

While scalable shared-memory multiprocessors with hardware-assisted cache coherence are relatively easy to program. If truly high-performance is desired, they still require substantial programmer effort. For example, data must be allocated close to the processors that will use them and the application must be tuned so that the working set fits in the caches. This is unfortunate because the most important obstacle to widespread use of parallel computing is the hardship of programming parallel machines. The goal of the I-ACOMA project is to explore how to design a highly programmable high-performance multiprocessor. The authors focus on a flat-coma scalable multiprocessor supported by a parallelizing compiler. The main issues that they are studying are advanced processor organizations. Techniques to handle long memory access latencies, and support for important classes of workloads like databases and scientific applications with loops that cannot be compiler analyzed. The project also involves building a prototype that includes some of the features discussed.