Hierarchical cache/bus architecture for shared memory multiprocessors
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Comparative performance evaluation of cache-coherent NUMA and COMA architectures
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The performance impact of flexibility in the Stanford FLASH multiprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Memory system performance of UNIX on CC-NUMA multiprocessors
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The MIT Alewife machine: architecture and performance
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
IEEE Standard for Scalable Coherent Interface, Science: IEEE Std. 1596-1992
Benchmark Handbook: For Database and Transaction Processing Systems
Benchmark Handbook: For Database and Transaction Processing Systems
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
The DASH Prototype: Logic Overhead and Performance
IEEE Transactions on Parallel and Distributed Systems
Operating system support for improving data locality on CC-NUMA compute servers
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
A performance evaluation of cluster architectures
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Shared-memory performance profiling
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware fault containment in scalable shared-memory multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
The Mercury Interconnect Architecture: a cost-effective infrastructure for high-performance servers
Proceedings of the 24th annual international symposium on Computer architecture
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Coherence controller architectures for SMP-based CC-NUMA multiprocessors
Proceedings of the 24th annual international symposium on Computer architecture
Reactive NUMA: a design for unifying S-COMA and CC-NUMA
Proceedings of the 24th annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
A Survey of Recoverable Distributed Shared Virtual Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Disco: running commodity operating systems on scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
Disco: running commodity operating systems on scalable multiprocessors
Proceedings of the sixteenth ACM symposium on Operating systems principles
In-memory directories: eliminating the cost of directories in CC-NUMAs
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A methodology and an evaluation of the SGI Origin2000
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A study of three dynamic approaches to handle widely shared data in shared-memory multiprocessors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors
Proceedings of the 25th annual international symposium on Computer architecture
Analytic evaluation of shared-memory systems with ILP processors
Proceedings of the 25th annual international symposium on Computer architecture
Protocol-based data-race detection
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Retrospective: on the inclusion properties for multi-level cache hierarchies
25 years of the international symposia on Computer architecture (selected papers)
IEEE Transactions on Computers - Special issue on cache memory and related problems
IEEE Transactions on Computers - Special issue on cache memory and related problems
Coherence Controller Architectures for Scalable Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Excel-NUMA: Toward Programmability, Simplicity, and High Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Efficient management of memory hierarchies in embedded DRAM systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Optimal replacements in caches with two miss costs
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Scal-Tool: pinpointing and quantifying scalability bottlenecks in DSM multiprocessors
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance experiences on Sun's Wildfire prototype
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
A case for user-level dynamic page migration
Proceedings of the 14th international conference on Supercomputing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
IEEE Transactions on Parallel and Distributed Systems
Design and Evaluation of a Switch Cache Architecture for CC-NUMA Multiprocessors
IEEE Transactions on Computers
Timestamp snooping: an approach for extending SMPs
ACM SIGPLAN Notices
Exploiting Network Locality for CC-NUMA Multiprocessors
The Journal of Supercomputing
Proceedings of the 2001 ACM symposium on Applied computing
Timestamp snooping: an approach for extending SMPs
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
ADir_pNB: A Cost-Effective Way to Implement Full Map Directory-Based Cache Coherence Protocols
IEEE Transactions on Computers
Optimizing software cache-coherent cluster architectures
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Reducing coherence overhead of barrier synchronization in software DSMs
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
An Application-Driven Study of Multicast Communication for Write Invalidation
The Journal of Supercomputing
Load Balancing for Parallel Query Execution on NUMA Multiprocessors
Distributed and Parallel Databases
Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models
International Journal of Parallel Programming
An Evaluation of a Commercial CC-NUMA Architecture: The CONVEX Exemplar SPP1200
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance Analysys of a CC-NUMAOperating System
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Efficient synchronization for nonuniform communication architectures
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Inferential queueing and speculative push for reducing critical communication latencies
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Scalability in computing for today and tomorrow
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Hierarchical Backoff Locks for Nonuniform Communication Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Thread-Based Protocol Engines for CC-NUMA Multiprocessors
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
The Impact of Negative Acknowledgments in Shared Memory Scientific Applications
IEEE Transactions on Parallel and Distributed Systems
SMTp: An Architecture for Next-generation Scalable Multi-threading
Proceedings of the 31st annual international symposium on Computer architecture
Exploring Virtual Network Selection Algorithms in DSM Cache Coherence Protocols
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Enabling autonomic behavior in systems software with hot swapping
IBM Systems Journal
Journal of Systems Architecture: the EUROMICRO Journal
A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Journal of Parallel and Distributed Computing
Evaluating scheduling policies for fine-grain communication protocols on a cluster of SMPs
Journal of Parallel and Distributed Computing
Formal Verification and its Impact on the Snooping versus Directory Protocol Debate
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Inferential queueing and speculative push
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
International Journal of Parallel Programming
An efficient cache design for scalable glueless shared-memory multiprocessors
Proceedings of the 3rd conference on Computing frontiers
Efficient address remapping in distributed shared-memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
Bounding the minimal completion time in high-performance parallel processing
International Journal of High Performance Computing and Networking
Speeding-up multiprocessors running DBMS workloads through coherence protocols
International Journal of High Performance Computing and Networking
A case for low-complexity MP architectures
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Journal of Parallel and Distributed Computing
Experience with building a commodity intel-based ccNUMA system
IBM Journal of Research and Development
High-throughput coherence control and hardware messaging in everest
IBM Journal of Research and Development
IBM Journal of Research and Development
The SKB: a semi-completely-connected bus for on-chip systems
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Cost-aware caching schemes in heterogeneous storage systems
The Journal of Supercomputing
Hi-index | 0.02 |
"STiNG" is a Cache Coherent Non-Uniform Memory Access (CC-NUMA) Multiprocessor designed and built by Sequent Computer Systems, Inc. It combines four processor Symmetric Multi-processor (SMP) nodes (called Quads), using a Scalable Coherent Interface (SCI) based coherent interconnect. The Quads are based on the Intel P6 processor and the external bus it defines. In addition to 4 P6 processors, each Quad may contain up to 4 GBytes of system memory, 2 Peripheral Component Interface (PCI) busses for I/O, and a Lynx board. The Lynx board provides the datapath to the SCI-based interconnect and ensures system-wide cache coherency. STiNG is one of the first commercial CC-NUMA systems to be built. This paper describes the motivation for building STiNG as well as its architecture and implementation. In addition, performance analysis is provided for On-Line Transaction Processing (OLTP) and Decision Support System (DSS) workloads. Finally, the status of the current implementation is reviewed.