Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
Multiprocessor cache analysis using ATUM
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
IEEE Spectrum
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The DASH prototype: implementation and performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The design and analysis of DASH: a scalable directory-based multiprocessor
The design and analysis of DASH: a scalable directory-based multiprocessor
Simulation of multiprocessors: accuracy and performance
Simulation of multiprocessors: accuracy and performance
Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The performance of cache-coherent ring-based multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
ACM Computing Surveys (CSUR)
Portable Programs for Parallel Processors
Portable Programs for Parallel Processors
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Multithreading with Distributed Functional Units
IEEE Transactions on Computers
A performance evaluation of cluster architectures
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading
ACM Transactions on Computer Systems (TOCS)
Multiplex: unifying conventional and speculative thread-level parallelism on a chip multiprocessor
ICS '01 Proceedings of the 15th international conference on Supercomputing
Comparing power consumption of an SMT and a CMP DSP for mobile phone workloads
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Multiprocessors from a Software Perspective
IEEE Micro
The impact of shared-cache clustering in small-scale shared-memory multiprocessors
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Optimizing inter-processor data locality on embedded chip multiprocessors
Proceedings of the 5th ACM international conference on Embedded software
Static cache partitioning robustness analysis for embedded on-chip multi-processors
Proceedings of the 3rd conference on Computing frontiers
Code restructuring for improving cache performance of MPSoCs
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Dynamic partitioning of processing and memory resources in embedded MPSoC architectures
Proceedings of the conference on Design, automation and test in Europe: Proceedings
A flexible data to L2 cache mapping approach for future multicore processors
Proceedings of the 2006 workshop on Memory system performance and correctness
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Compositional, dynamic cache management for embedded chip multiprocessors
Proceedings of the conference on Design, automation and test in Europe
An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers I
Static Cache Partitioning Robustness Analysis for Embedded On-Chip Multi-processors
Transactions on High-Performance Embedded Architectures and Compilers I
Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors
Journal of Signal Processing Systems
A minimalist cache coherent MPSoC designed for FPGAs
International Journal of High Performance Systems Architecture
Optimizing integrated application performance with cache-aware metascheduling
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
Parallel matrix multiplication based on dynamic SMP clusters in SoC technology
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.00 |
In the near future, semiconductor technology will allow the integration of multiple processors on a chip or multichip-module (MCM). In this paper we investigate the architecture and partitioning of resources between processors and cache memory for single chip and MCM-based multiprocessors. We study the performance of a cluster-based multiprocessor architecture in which processors within a cluster are tightly coupled via a shared cluster cache for various processor-cache configurations. Our results show that for parallel applications, clustering via shared caches provides an effective mechanism for increasing the total number of processors in a system, without increasing the number of invalidations. Combining these results with cost estimates for shared cluster cache implementations leads to two conclusions: 1) For a four cluster multiprocessor with single chip clusters, two processors per cluster with a smaller cache provides higher performance and better cost/performance than a single processor with a larger cache and 2) this four cluster configuration can be scaled linearly in performance by adding processors to each cluster using MCM packaging techniques.