Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Exploring the design space for a shared-cache multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The AlphaServer 8000 series: high-end server platform development
Digital Technical Journal - Special 10th anniversary issue
Real-time volume rendering on shared memory multiprocessors using the shear-warp factorization
PRS '95 Proceedings of the IEEE symposium on Parallel rendering
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The impact of architectural trends on operating system performance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Proceedings of the 28th annual international symposium on Microarchitecture
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The impact of shared-cache clustering in small-scale shared-memory multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Rationale, Design and Performance of the Hydra Multiprocessor
Rationale, Design and Performance of the Hydra Multiprocessor
Performance Factors for Superscalar Processors
Performance Factors for Superscalar Processors
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor
Proceedings of the 25th annual international symposium on Computer architecture
Performance of database workloads on shared-memory systems with out-of-order processors
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors
IEEE Transactions on Computers - Special issue on cache memory and related problems
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Hardware identification of cache conflict misses
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
The Multicluster Architecture: Reducing Processor Cycle Time Through Partitioning
International Journal of Parallel Programming
Using complete system simulation to characterize SPECjvm98 benchmarks
Proceedings of the 14th international conference on Supercomputing
Characterizing processor architectures for programmable network interfaces
Proceedings of the 14th international conference on Supercomputing
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Runtime identification of cache conflict misses: The adaptive miss buffer
ACM Transactions on Computer Systems (TOCS)
High-Performance DRAMs in Workstation Environments
IEEE Transactions on Computers
Proceedings of the 39th annual Design Automation Conference
Trident: a scalable architecture for scalar, vector, and matrix operations
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Performance-steered design of software architectures for embedded multicore systems
Software—Practice & Experience
Fine-grain design space exploration for a cartographic SoC multiprocessor
ACM SIGARCH Computer Architecture News
ARVLSI '97 Proceedings of the 17th Conference on Advanced Research in VLSI (ARVLSI '97)
Tuning data replication for improving behavior of MPSoC applications
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Dynamic on-chip memory management for chip multiprocessors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Optimizing Array-Intensive Applications for On-Chip Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Customized on-chip memories for embedded chip multiprocessors
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Selective code/data migration for reducing communication energy in embedded MpSoC architectures
GLSVLSI '06 Proceedings of the 16th ACM Great Lakes symposium on VLSI
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Spin Detection Hardware for Improved Management of Multithreaded Systems
IEEE Transactions on Parallel and Distributed Systems
Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
A shared cache for a chip multi vector processor
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Reusability-aware cache memory sharing for chip multiprocessors with private L2 caches
Journal of Systems Architecture: the EUROMICRO Journal
Using data compression for increasing memory system utilization
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Finding representative workloads for computer system design
Finding representative workloads for computer system design
A minimalist cache coherent MPSoC designed for FPGAs
International Journal of High Performance Systems Architecture
L2-Cache hierarchical organizations for multi-core architectures
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.01 |
In the future, advanced integrated circuit processing and packaging technology will allow for several design options for multiprocessor microprocessors. In this paper we consider three architectures: shared-primary cache, shared-secondary cache, and shared-memory. We evaluate these three architectures using a complete system simulation environment which models the CPU, memory hierarchy and I/O devices in sufficient detail to boot and run a commercial operating system. Within our simulation environment, we measure performance using representative hand and compiler generated parallel applications, and a multiprogramming workload. Our results show that when applications exhibit fine-grained sharing, both shared-primary and shared-secondary architectures perform similarly when the full costs of sharing the primary cache are included.