I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Mul-T: a high-performance parallel Lisp
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
LimitLESS directories: A scalable cache coherence scheme
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Exploiting heterogeneous parallelism on a multithreaded multiprocessor
ICS '92 Proceedings of the 6th international conference on Supercomputing
Closing the window of vulnerability in multiphase memory transactions
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Integrating message-passing and shared-memory: early experience
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Experience with fine-grain synchronization in MIMD machines for preconditioned conjugate gradient
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The design of the Caltech Mosaic C multicomputer
Proceedings of the 1993 symposium on Research on integrated systems
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Software-extended coherent shared memory: performance and cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Monsoon: an explicit token-store architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
APRIL: a processor architecture for multiprocessing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
FUGU: Implementing Translation and Protection in a Multiuser, Multimodel Multiprocessor
FUGU: Implementing Translation and Protection in a Multiuser, Multimodel Multiprocessor
Optimistic active messages: a mechanism for scheduling communication with computation
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
ACM Transactions on Computer Systems (TOCS)
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A Framework for Designing Deadlock-Free Wormhole Routing Algorithms
IEEE Transactions on Parallel and Distributed Systems
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
MGS: a multigrain shared memory system
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Polling watchdog: combining polling and interrupts for efficient message handling
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Limits on the performance benefits of multithreading and prefetching
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using dataflow analysis techniques to reduce ownership overhead in cache coherence protocols
ACM Transactions on Programming Languages and Systems (TOPLAS)
An evaluation of memory consistency models for shared-memory systems with ILP processors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Hiding communication latency and coherence overhead in software DSMs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Memory organization in multi-channel optical networks: NUMA and COMA revisited
ICS '96 Proceedings of the 10th international conference on Supercomputing
Reducing synchronization overhead in parallel simulation
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Fine-grain multithreading with the EM-X multiprocessor
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
LoPC: modeling contention in parallel algorithms
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Cashmere-2L: software coherent shared memory on a clustered remote-write network
Proceedings of the sixteenth ACM symposium on Operating systems principles
An interaction of coherence protocols and memory consistency models in DSM systems
ACM SIGOPS Operating Systems Review
Performance analysis on a CC-NUMA prototype
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Design and implementation of the NUMAchine multiprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Per-Node Multithreading and Remote Latency
IEEE Transactions on Computers
In-memory directories: eliminating the cost of directories in CC-NUMAs
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A methodology and an evaluation of the SGI Origin2000
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
LoGPC: modeling network contention in message-passing programs
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Support for Efficient Programming on the SB-PRAM
International Journal of Parallel Programming
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Design choices in the SHRIMP system: an empirical study
Proceedings of the 25th annual international symposium on Computer architecture
Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms
IEEE Transactions on Parallel and Distributed Systems
Pc-based Shared Memory Architecture and Language
The Journal of Supercomputing
Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs
International Journal of Parallel Programming
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
IEEE Transactions on Computers - Special issue on cache memory and related problems
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Scaling application performance on a cache-coherent multiprocessor
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
The scalability of multigrain systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
PiSMA: a parallel VSM architecture
Crossroads
ACM Transactions on Computer Systems (TOCS)
Dynamic computation migration in DSM systems
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Simulation of the 3 dimensional cascade flow with numerical wind tunnel (NWT)
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Exploiting Network Locality for CC-NUMA Multiprocessors
The Journal of Supercomputing
LoGPC: Modeling Network Contention in Message-Passing Programs
IEEE Transactions on Parallel and Distributed Systems
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors
Compiler optimizations for scalable parallel systems
Tolerating communication latency through dynamic thread invocation in a multithreaded architecture
Compiler optimizations for scalable parallel systems
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
A Fast and Efficient Processor Allocation Scheme for Mesh-Connected Multicomputers
IEEE Transactions on Computers
Asynchrony in parallel computing: from dataflow to multithreading
Progress in computer research
Non-blocking timeout in scalable queue-based spin locks
Proceedings of the twenty-first annual symposium on Principles of distributed computing
An Application-Driven Study of Multicast Communication for Write Invalidation
The Journal of Supercomputing
Load Balancing for Parallel Query Execution on NUMA Multiprocessors
Distributed and Parallel Databases
Techniques for Compiler-Directed Cache Coherence
IEEE Parallel & Distributed Technology: Systems & Technology
IEEE Transactions on Parallel and Distributed Systems
A survey of processors with explicit multithreading
ACM Computing Surveys (CSUR)
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Effects of Multithreading on Data and Workload Distribution for Distributed-Memory Multiprocessors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
An Evaluation of a Commercial CC-NUMA Architecture: The CONVEX Exemplar SPP1200
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Latency Tolerance: A Metric for Performance Analysis of Multithreaded Architectures
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Coherent Block Data Transfer in the FLASH Multiprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Queuing Model of a Multi-threaded Architecture: A Case Study
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Performance of MP3D on the SB-PRAM Prototype (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Processor Mechanisms for Software Shared Memory
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
A Progressive Approach to Handling Message-Dependent Deadlock in Parallel Computer Systems
IEEE Transactions on Parallel and Distributed Systems
Data locality sensitivity of multithreaded computations on a distributed-memory multiprocessor
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Scalability in computing for today and tomorrow
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Measurement and Modeling of EARTH-MANNA Multithreaded Architecture
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
ISCC '00 Proceedings of the Fifth IEEE Symposium on Computers and Communications (ISCC 2000)
The Thread-Based Protocol Engines for CC-NUMA Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation
IEEE Transactions on Computers
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Non-strict execution in parallel and distributed computing
International Journal of Parallel Programming
Journal of Systems Architecture: the EUROMICRO Journal
On the performance of multicomputer interconnection networks
Journal of Systems Architecture: the EUROMICRO Journal
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Microarchitecture of a High-Radix Router
Proceedings of the 32nd annual international symposium on Computer Architecture
Shared memory computing on clusters with symmetric multiprocessors and system area networks
ACM Transactions on Computer Systems (TOCS)
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
A comparison of the effect of branch prediction on multithreaded and scalar architectures
ACM SIGARCH Computer Architecture News
K42: building a complete operating system
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Probabilistic analysis on mesh network fault tolerance
Journal of Parallel and Distributed Computing
HP scalable computing architecture
WIESS'00 Proceedings of the 1st conference on Industrial Experiences with Systems Software - Volume 1
Experience with a language for writing coherence protocols
DSL'97 Proceedings of the Conference on Domain-Specific Languages on Conference on Domain-Specific Languages (DSL), 1997
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
Towards an active network architecture
ACM SIGCOMM Computer Communication Review
TriBA: a novel scalable architecture for high performance parallel computing applications
ACOS'07 Proceedings of the 6th Conference on WSEAS International Conference on Applied Computer Science - Volume 6
Reducing the Interconnection Network Cost of Chip Multiprocessors
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Disaggregated memory for expansion and sharing in blade servers
Proceedings of the 36th annual international symposium on Computer architecture
A memory system design framework: creating smart memories
Proceedings of the 36th annual international symposium on Computer architecture
Experience with building a commodity intel-based ccNUMA system
IBM Journal of Research and Development
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
Fault-tolerant mapping of a mesh network in a flexible hypercube
WSEAS Transactions on Computers
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Lower bounds on the connectivity probability for 2-D mesh networks
WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Type systems for distributed data sharing
SAS'03 Proceedings of the 10th international conference on Static analysis
MIPS MT: a multithreaded RISC architecture for embedded real-time processing
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Proceedings of the Conference on Design, Automation and Test in Europe
HPP controller: a system controller for high performance computing
Frontiers of Computer Science in China
Architectural Support for Fair Reader-Writer Locking
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Crunching large graphs with commodity processors
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
CudaDMA: optimizing GPU memory bandwidth via warp specialization
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Comparison of SBA – family task allocation algorithms for mesh structured networks
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Upper bounds on the connection probability for 2-D meshes and tori
Journal of Parallel and Distributed Computing
The data diffusion space for parallel computing in clusters
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Fault tolerance analysis of mesh networks with uniform versus nonuniform node failure probability
Information Processing Letters
Static and dynamic allocation algorithms in mesh structured networks
ICDCIT'06 Proceedings of the Third international conference on Distributed Computing and Internet Technology
SuperCoP: a general, correct, and performance-efficient supervised memory system
Proceedings of the 9th conference on Computing Frontiers
PARDIS: a programmable memory controller for the DDRx interfacing standards
Proceedings of the 39th Annual International Symposium on Computer Architecture
Support for fine-grained synchronization in shared-memory multiprocessors
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Computers and Electrical Engineering
A programmable memory controller for the DDRx interfacing standards
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Elimination Trees and the Construction of Pools and Stacks
Theory of Computing Systems
Hi-index | 0.02 |
Alewife is a multiprocessor architecture that supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node. The MIT Alewife machine, a prototype implementation of the architecture, demonstrates that a parallel system can be both scalable and programmable. Four mechanisms combine to achieve these goals: software-extended coherent shared memory provides a global, linear address space; integrated message passing allows compiler and operating system designers to provide efficient communication and synchronization; support for fine-grain computation allows many processors to cooperate on small problem sizes; and latency tolerance mechanisms --- including block multithreading and prefetching --- mask unavoidable delays due to communication.Microbenchmarks, together with over a dozen complete applications running on the 32-node prototype, help to analyze the behavior of the system. Analysis shows that integrating message passing with shared memory enables a cost-efficient solution to the cache coherence problem and provides a rich set of programming primitives. Block multithreading and prefetching improve performance by up to 25% individually, and 35% together. Finally, language constructs that allow programmers to express fine-grain synchronization can improve performance by over a factor of two.