Performance balancing: software-based on-chip memory management for effective CMP executions

Authors:
Naoto Fukumoto;Kenichi Imazato;Koji Inoue;Kazuaki Murakami
Affiliations:
Kyushu University, Nishi-ku, Fukuoka City, Japan;Kyushu University, Nishi-ku, Fukuoka City, Japan;Kyushu University, Nishi-ku, Fukuoka City, Japan;Kyushu University, Nishi-ku, Fukuoka City, Japan
Venue:
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Year:
2009

Citing 13
Cited 0

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Efficient emulation of hardware prefetchers via event-driven helper threading

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
Stealth prefetching

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Cell broadband engine architecture and its first implementation: a performance view

IBM Journal of Research and Development
Feedback-driven threading: power-efficient and high-performance execution of multi-threaded workloads on CMPs

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Identifying energy-efficient concurrency levels using machine learning

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes the concept of performance balancing, and reports its performance impact on a Chip multiprocessor (CMP). Integrating multiple processor cores into a single chip, or CMPs, can achieve higher peak performance by means of exploiting thread level parallelism. However, the off-chip memory bandwidth which does not scale with the number of cores tends to limit the potential of CMPs. To solve this issue, the technique proposed in this paper attempts to make a good balance between computation and memorization. Unlike conventional parallel executions, this approach exploits some cores to improve the memory performance. These cores devote the on-chip memory hardware resources to the remaining cores executing the parallelized threads. In our evaluation, it is observed that our approach can achieve 31% of performance improvement compared to a conventional parallel execution model in the specified program.