Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A study of performance impact of memory controller features in multi-processor server environment
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Effective Management of DRAM Bandwidth in Multicore Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Architectural Support for the Stream Execution Model on General-Purpose Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Memory scheduling for modern microprocessors
ACM Transactions on Computer Systems (TOCS)
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Core-aware memory access scheduling schemes
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Hi-index | 0.00 |
Memory Wall has been a well-known obstacle to processor performance improvement. The dawn of many-core processors will further exaggerate the problem. As a result, efficient memory task scheduling has been one important means to sustaining the performance growth. In this paper, we first develop an analytical model to capture the essence of on-chip compute and off-chip communication as shown in the stream programming model. It estimates the potential speedup that can be achieved by restricting the number of simultaneous memory tasks to reduce memory bandwidth contention. We then corroborate the analytical model with experimental results from task scheduling on real hardware. Correlation between the analytical and experimental results offers both insight into the benchmarks running on the hardware and opportunities to extend the analytical model. Our results show that restricting the number of simultaneous memory tasks achieves up to 60% performance improvement with a pool of synthetic workloads.