Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
The Stanford Dash Multiprocessor
Computer
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
T: a multithreaded massively parallel architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A tightly-coupled processor-network interface
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Simulation of multiprocessors: accuracy and performance
Simulation of multiprocessors: accuracy and performance
The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Evaluation of mechanisms for fine-grained parallel programs in the J-machine and the CM-5
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Anatomy of a message in the Alewife multiprocessor
ICS '93 Proceedings of the 7th international conference on Supercomputing
Cache coherence directories for scalable multiprocessors
Cache coherence directories for scalable multiprocessors
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
THE MIT ALEWIFE MACHINE: A LARGE-SCALE DISTRIBUTED-MEMORY MULTIPROCESSOR
Integrating multiple communication paradigms in high performance multiprocessors
Integrating multiple communication paradigms in high performance multiprocessors
Performance evaluation of hybrid hardware and software distributed shared memory protocols
ICS '94 Proceedings of the 8th international conference on Supercomputing
Software versus hardware shared-memory implementation: a case study
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reactive synchronization algorithms for multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Integration of message passing and shared memory in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
AP1000+: architectural support of PUT/GET interface for parallelizing compiler
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
LCM: memory system support for parallel language implementation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance advantages of integrating block data transfer in cache-coherent multiprocessors
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The operating system kernel as a secure programmable machine
ACM SIGOPS Operating Systems Review
Software caching and computation migration in Olden
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient support for irregular applications on distributed-memory machines
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing false sharing on shared memory multiprocessors through compile time data transformations
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Remote queues: exposing message queues for optimization and atomicity
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A comprehensive bibliography of distributed shared memory
ACM SIGOPS Operating Systems Review
Efficient shared memory with minimal hardware support
ACM SIGARCH Computer Architecture News
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Recovery protocols for shared memory database systems
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Hive: fault containment for shared-memory multiprocessors
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Serverless network file systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Lazy release consistency for hardware-coherent multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Architectural mechanisms for explicit communication in shared memory multiprocessors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Empirical evaluation of the CRAY-T3D: a compiler perspective
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Architecture validation for processors
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The Chinook hardware/software co-synthesis system
ISSS '95 Proceedings of the 8th international symposium on System synthesis
ICS '95 Proceedings of the 9th international conference on Supercomputing
Evaluating the impact of advanced memory systems on compiler-parallelized codes
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Efficient validity checking for processor verification
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Proceedings of the 28th annual international symposium on Microarchitecture
Serverless network file systems
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Teapot: language support for writing memory coherence protocols
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Decoupled hardware support for distributed shared memory
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
COMA: an opportunity for building fault-tolerant scalable shared memory multiprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Application and architectural bottlenecks in large scale distributed shared memory machines
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Coherent network interfaces for fine-grain communication
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An integrated compile-time/run-time software distributed shared memory system
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Hiding communication latency and coherence overhead in software DSMs
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
SoftFLASH: analyzing the performance of clustered distributed virtual shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Verification of FLASH cache coherence protocol by aggregation of distributed transactions
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Memory organization in multi-channel optical networks: NUMA and COMA revisited
ICS '96 Proceedings of the 10th international conference on Supercomputing
The SHRIMP performance monitor: design and applications
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
An Architecture for Tolerating Processor Failures in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Validation coverage analysis for complex digital designs
Proceedings of the 1996 IEEE/ACM international conference on Computer-aided design
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Characterizing the Memory Behavior of Compiler-Parallelized Applications
IEEE Transactions on Parallel and Distributed Systems
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
HFS: a performance-oriented flexible file system based on building-block compositions
ACM Transactions on Computer Systems (TOCS)
The interaction of parallel programming constructs and coherence protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Ace: linguistic mechanisms for customizable protocols
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared-memory performance profiling
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware fault containment in scalable shared-memory multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Effects of communication latency, overhead, and bandwidth in a cluster architecture
Proceedings of the 24th annual international symposium on Computer architecture
Designing high bandwidth on-chip caches
Proceedings of the 24th annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
Disco: running commodity operating systems on scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Disco: running commodity operating systems on scalable multiprocessors
Proceedings of the sixteenth ACM symposium on Operating systems principles
Distributed schedule management in the Tiger video fileserver
Proceedings of the sixteenth ACM symposium on Operating systems principles
An interaction of coherence protocols and memory consistency models in DSM systems
ACM SIGOPS Operating Systems Review
Performance analysis on a CC-NUMA prototype
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Design and implementation of the NUMAchine multiprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Approximate reachability with BDDs using overlapping projections
DAC '98 Proceedings of the 35th annual Design Automation Conference
Validation with guided search of the state space
DAC '98 Proceedings of the 35th annual Design Automation Conference
Digital system simulation: methodologies and examples
DAC '98 Proceedings of the 35th annual Design Automation Conference
Support for Efficient Programming on the SB-PRAM
International Journal of Parallel Programming
Options for dynamic address translation in COMAs
Proceedings of the 25th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Adapting the Network Interface for High-Performance Computing: The CNI Approach
The Journal of Supercomputing - Special issue: high performance distributed computing
Protocol-based data-race detection
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms
IEEE Transactions on Parallel and Distributed Systems
Formal verification of complex coherence protocols using symbolic state models
Journal of the ACM (JACM)
Pc-based Shared Memory Architecture and Language
The Journal of Supercomputing
25 years of the international symposia on Computer architecture (selected papers)
The MIT Alewife machine: architecture and performance
25 years of the international symposia on Computer architecture (selected papers)
Verification by approximate forward and backward reachability
Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Evaluating the Effect of Coherence Protocols on the Performance of Parallel Programming Constructs
International Journal of Parallel Programming
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
UTLB: a mechanism for address translation on network interfaces
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Tapeworm: high-level abstractions of shared accesses
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
IEEE Transactions on Computers - Special issue on cache memory and related problems
IEEE Transactions on Computers - Special issue on cache memory and related problems
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Adapting cache line size to application behavior
ICS '99 Proceedings of the 13th international conference on Supercomputing
Realizing the performance potential of the virtual interface architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
Improved approximate reachability using auxiliary state variables
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Using partitioning to help convergence in the standard-cell design automation methodology
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Concurrent Event Handling through Multithreading
IEEE Transactions on Computers
Cellular Disco: resource management using virtual clusters on shared-memory multiprocessors
Proceedings of the seventeenth ACM symposium on Operating systems principles
Ace: a language for parallel programming with customizable protocols
ACM Transactions on Computer Systems (TOCS)
Performance experiences on Sun's Wildfire prototype
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A high-level abstraction of shared accesses
ACM Transactions on Computer Systems (TOCS)
A case for user-level dynamic page migration
Proceedings of the 14th international conference on Supercomputing
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Interconnect scaling implications for CAD
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
An Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures
IEEE Transactions on Computers
ACM Transactions on Computer Systems (TOCS)
IEEE Transactions on Parallel and Distributed Systems
Cellular disco: resource management using virtual clusters on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Architecture and design of AlphaServer GS320
ACM SIGPLAN Notices
FLASH vs. (simulated) FLASH: closing the simulation loop
ACM SIGPLAN Notices
Using meta-level compilation to check FLASH protocol code
ACM SIGPLAN Notices
Compiler-directed shared-memory communication for iterative parallel applications
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Using hardware performance monitors to isolate memory bottlenecks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exploiting Wavefront Parallelism on Large-Scale Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
High Bandwidth On-Chip Cache Design
IEEE Transactions on Computers
Parallelizing the Murϕ Verifier
Formal Methods in System Design - Special issue on CAV '97
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
FLASH vs. (Simulated) FLASH: closing the simulation loop
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Using meta-level compilation to check FLASH protocol code
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Using texture mapping with mipmapping to render a VLSI layout
Proceedings of the 38th annual Design Automation Conference
Proceedings of the 38th annual Design Automation Conference
A simple method for extracting models for protocol code
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The operating system kernel as a secure programmable machine
EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
Optimizing software cache-coherent cluster architectures
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
StarT-Voyager: a flexible platform for exploring scalable SMP issues
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Deriving a simulation input generator and a coverage metric from a formal specification
Proceedings of the 39th annual Design Automation Conference
Leveraging cache coherence in active memory systems
ICS '02 Proceedings of the 16th international conference on Supercomputing
Deriving Efficient Cache Coherence Protocols Through Refinement
Formal Methods in System Design
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Counterexample-guided choice of projections in approximate symbolic model checking
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
The Journal of Supercomputing - Special issue on embedded fault-tolerance systems
An Application-Driven Study of Multicast Communication for Write Invalidation
The Journal of Supercomputing
Load Balancing for Parallel Query Execution on NUMA Multiprocessors
Distributed and Parallel Databases
Distributed Shared Memory: Concepts and Systems
IEEE Parallel & Distributed Technology: Systems & Technology
Using Formal Specifications for Functional Validation of Hardware Designs
IEEE Design & Test
Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems
IEEE Transactions on Parallel and Distributed Systems
Analytic Evaluation of Shared-Memory Architectures
IEEE Transactions on Parallel and Distributed Systems
How Much Does Network Contention Affect Distributed Shared Memory Performance?
ICPP '97 Proceedings of the international Conference on Parallel Processing
Hardware Versus Software Implementation of COMA
ICPP '97 Proceedings of the international Conference on Parallel Processing
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Exploiting the Capabilities of Communications Co-Processors
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Dag-Consistent Distributed Shared Memory
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
An Evaluation of a Commercial CC-NUMA Architecture: The CONVEX Exemplar SPP1200
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Aurora: Scoped Behavior for Per-Context Optimized Distributed Data Sharing
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Reducing Waiting Costs in User-Level Communication
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Coherent Block Data Transfer in the FLASH Multiprocessor
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance Analysys of a CC-NUMAOperating System
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Practical Network Applications on a Lightweight Active Management Environment
IWAN '01 Proceedings of the IFIP-TC6 Third International Working Conference on Active Networks
Compiler-Directed Cache Assist Adaptivity
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Model Checking Support for the ASM High-Level Language
TACAS '00 Proceedings of the 6th International Conference on Tools and Algorithms for Construction and Analysis of Systems: Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS 2000
How Can We Design Better Networks for DSM Systems?
PCRCW '97 Proceedings of the Second International Workshop on Parallel Computer Routing and Communication
Parameterized Verification of the FLASH Cache Coherence Protocol by Compositional Model Checking
CHARME '01 Proceedings of the 11th IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
The Mobile Object Layer: A Run-Time Substrate for Mobile Adaptive Computations
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Towards a Methodology for Model Checking ASM: Lessons Learned from the FLASH Case Study
ASM '00 Proceedings of the International Workshop on Abstract State Machines, Theory and Applications
Locality Enhancement for Large-Scale Shared-Memory Multiprocessors
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Thread Migration and Load-Balancing in Heterogeneous Environments
LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Processor Mechanisms for Software Shared Memory
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Active Memory Clusters: Efficient Multiprocessing on Commodity Clusters
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
SIGMA: a simulator infrastructure to guide memory analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Interactive locality optimization on NUMA architectures
Proceedings of the 2003 ACM symposium on Software visualization
Cluster Queue Structure for Shared-Memory Multiprocessor Systems
The Journal of Supercomputing
Asynchronous Microengines for Efficient High-level Control
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Software cache coherence for large scale multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Distance-Adaptive Update Protocols for Scalable Shared-Memory Multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Protected, user-level DMA for the SHRIMP network interface
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Active I/O Switches in System Area Networks
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
CNI: A High-Performance Network Interface for Workstation Clusters
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Trojan: A High-Performance Simulator for Shared Memory Architectures
SS '96 Proceedings of the 29th Annual Simulation Symposium (SS '96)
Verification of an Industrial CC-NUMA Server
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
ISCC '00 Proceedings of the Fifth IEEE Symposium on Computers and Communications (ISCC 2000)
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Systematic Validation of Pipeline Interlock for Superscalar Microarchitectures
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Latency, Occupancy, and Bandwidth in DSM Multiprocessors: A Performance Evaluation
IEEE Transactions on Computers
Verifying Sequential Consistency on Shared-Memory Multiprocessors by Model Checking
IEEE Transactions on Parallel and Distributed Systems
The Impact of Negative Acknowledgments in Shared Memory Scientific Applications
IEEE Transactions on Parallel and Distributed Systems
Architectural Support for Uniprocessor and Multiprocessor Active Memory Systems
IEEE Transactions on Computers
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
An ultra low-power processor for sensor networks
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Predicting the Performance of Synchronous Discrete Event Simulation
IEEE Transactions on Parallel and Distributed Systems
EMPS: An Environment for Memory Performance Studies
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Cache coherence support for non-shared bus architecture on heterogeneous MPSoCs
Proceedings of the 42nd annual Design Automation Conference
Moving Address Translation Closer to Memory in Distributed Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
TAPE: a transactional application profiling environment
Proceedings of the 19th annual international conference on Supercomputing
Hardware-modulated parallelism in chip multiprocessors
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Tight Bounds for Critical Sections in Processor Consistent Platforms
IEEE Transactions on Parallel and Distributed Systems
Programmable bus/memory controllers in modern computer architecture
Proceedings of the 43rd annual Southeast regional conference - Volume 1
TMA: a trap-based memory architecture
Proceedings of the 20th annual international conference on Supercomputing
Checking system rules using system-specific, programmer-written compiler extensions
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Message-driven relaxed consistency in a software distributed shared memory
OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Brazos: a third generation DSM system
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Implementation of a reliable remote memory pager
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
FLIPC: a low latency messaging system for distributed real time environments
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Proceedings of the 21st annual international conference on Supercomputing
An Operational Semantics for Shared Messaging Communication
Electronic Notes in Theoretical Computer Science (ENTCS)
How low can you go?: recommendations for hardware-supported minimal TCB code execution
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Energy efficient scheduling for parallel applications on mobile clusters
Cluster Computing
The case for simple, visible cache coherency
Proceedings of the 2008 ACM SIGPLAN workshop on Memory systems performance and correctness: held in conjunction with the Thirteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '08)
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Direct Support for Model Checking Abstract State Machines by Utilizing Simulation
ABZ '08 Proceedings of the 1st international conference on Abstract State Machines, B and Z
Pangaea: a tightly-coupled IA32 heterogeneous chip multiprocessor
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A comparative evaluation of hybrid distributed shared-memory systems
Journal of Systems Architecture: the EUROMICRO Journal
Automatic non-interference lemmas for parameterized model checking
Proceedings of the 2008 International Conference on Formal Methods in Computer-Aided Design
A memory system design framework: creating smart memories
Proceedings of the 36th annual international symposium on Computer architecture
Experience with building a commodity intel-based ccNUMA system
IBM Journal of Research and Development
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
Using a configurable processor generator for computer architecture prototyping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Fault-tolerant mapping of a mesh network in a flexible hypercube
WSEAS Transactions on Computers
Efficient methods for formally verifying safety properties of hierarchical cache coherence protocols
Formal Methods in System Design
Fault-tolerant meshes and tori embedded in a faulty supercube
WSEAS Transactions on Computers
Proceedings of the Conference on Design, Automation and Test in Europe
MEDEA: a hybrid shared-memory/message-passing multiprocessor NoC-based architecture
Proceedings of the Conference on Design, Automation and Test in Europe
Toward reliable and efficient message passing software through formal analysis
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting locality: a flexible DSM approach
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An analysis of Linux scalability to many cores
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Architectural Support for Fair Reader-Writer Locking
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A NoC-based hybrid message-passing/shared-memory approach to CMP design
Microprocessors & Microsystems
WSEAS Transactions on Information Science and Applications
On fault-tolerant embedding of meshes and tori in a flexible hypercube with unbounded expansion
WSEAS TRANSACTIONS on SYSTEMS
Trust extension as a mechanism for secure code execution on commodity computers
Trust extension as a mechanism for secure code execution on commodity computers
Speeding-up synchronizations in DSM multiprocessors
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Manager-client pairing: a framework for implementing coherence hierarchies
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Parallel and distributed model checking in eddy
SPIN'06 Proceedings of the 13th international conference on Model Checking Software
Exploiting symmetry and transactions for partial order reduction of rule based specifications
SPIN'06 Proceedings of the 13th international conference on Model Checking Software
PARDIS: a programmable memory controller for the DDRx interfacing standards
Proceedings of the 39th Annual International Symposium on Computer Architecture
The Journal of Supercomputing
Computers and Electrical Engineering
A programmable memory controller for the DDRx interfacing standards
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.04 |
The FLASH multiprocessor efficiently integrates support for cache-coherent shared memory and high-performance message passing, while minimizing both hardware and software overhead. Each node in FLASH contains a microprocessor, a portion of the machine's global memory, a port to the interconnection network, an I/O interface, and a custom node controller called MAGIC. The MAGIC chip handles all communication both within the node and among nodes, using hardwired data paths for efficient data movement and a programmable processor optimized for executing protocol operations. The use of the protocol processor makes FLASH very flexible --- it can support a variety of different communication mechanisms --- and simplifies the design and implementation.This paper presents the architecture of FLASH and MAGIC, and discusses the base cache-coherence and message-passing protocols. Latency and occupancy numbers, which are derived from our system-level simulator and our Verilog code, are given for several common protocol operations. The paper also describes our software strategy and FLASH's current status.