The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance
Proceedings of the 31st annual international symposium on Computer architecture
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
The V-Way Cache: Demand Based Associativity via Global Replacement
Proceedings of the 32nd annual international symposium on Computer Architecture
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Dex: high-performance exploration on large graphs for information retrieval
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Amdahl's Law in the Multicore Era
Computer
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Scaling the bandwidth wall: challenges in and avenues for CMP scaling
Proceedings of the 36th annual international symposium on Computer architecture
Hi-index | 0.00 |
Processors have evolved to the now de-facto standard multicore architecture. The continuous advances in technology allow for increased component density, thus resulting in a larger number of cores on the chip. This, in turn, places pressure on the off-chip and pin bandwidth. Large Last-Level Caches (LLC), which are shared among all cores, have been used as a way to control the out-of-chip requests. In this work we focus on analyzing the memory behavior of a modern demanding application, a graph-based database workload, which is representative of future workloads. We analyze the performance of this application for different cache configurations in terms of: memory access time, bandwidth requirements, and power consumption. The experimental results show that the bandwidth requirements reduce as the number of clusters reduces and the LLC per cluster increases. This configuration is also the most power efficient. If on the other hand, memory latency is the dominant factor, assuming bandwidth is not a limitation, then the best configuration is the one with more clusters and smaller LLCs.