Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Microarchitecture of the Godson-2 processor
Journal of Computer Science and Technology
A parallel dynamic programming algorithm on a multi-core architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Low-Cost Epoch-Based Correlation Prefetching for Commercial Applications
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Experience on optimizing irregular computation for memory hierarchy in manycore architecture
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A Performance Model of Dense Matrix Operations on Many-Core Architectures
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Architectural support for cilk computations on many-core architectures
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
An important issue of current multi-core processors is the off-chip bandwidth sharing. Sharing is helpful to improve resource utilization and but more importantly and it may cause performance degradation due to contention. However and there is not enough research work on characterizing the workloads from bandwidth perspective. Moreover and the understanding of the impact of the bandwidth constraint on performance is still limited. In this paper and we propose the phase execution model and and evaluate the arithmetic to memory ratio (AMR) of each phase to characterize the bandwidth requirements of arbitrary programs. We apply the model to a set of SPEC benchmark programs and obtain two results. First and we propose a new taxonomy of workloads based on their bandwidth requirements. Second and we find that prefetching techniques are useful to improve system throughput of multi-core processors only when there is enough spare memory bandwidth.