A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Eager writeback - a technique for improving bandwidth utilization
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Command Vector Memory Systems: High Performance at Low Cost
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Modern dram architectures
Measuring the capacity of a web server
USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Flash: an efficient and portable web server
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
The irregular Z-buffer: Hardware acceleration for irregular data structures
ACM Transactions on Graphics (TOG)
The bit-reversal SDRAM address mapping
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
Long-latency branches: how much do they matter?
ACM SIGARCH Computer Architecture News
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
TCP offload through connection handoff
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Proceedings of the conference on Design, automation and test in Europe
Connection handoff policies for TCP offload network interfaces
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Memory scheduling for modern microprocessors
ACM Transactions on Computer Systems (TOCS)
Memory performance attacks: denial of memory service in multi-core systems
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Exploiting program cyclic behavior to reduce memory latency in embedded processors
Proceedings of the 2008 ACM symposium on Applied computing
An approach for adaptive DRAM temperature and power management
Proceedings of the 22nd annual international conference on Supercomputing
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
High-Performance Buffer Mapping to Exploit DRAM Concurrency in Multiprocessor DSP Systems
RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
High-bandwidth network memory system through virtual pipelines
IEEE/ACM Transactions on Networking (TON)
An approach for adaptive DRAM temperature and power management
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory access schedule minimization for embedded systems
Journal of Systems Architecture: the EUROMICRO Journal
A fair thread-aware memory scheduling algorithm for chip multiprocessor
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities
Proceedings of the 26th ACM international conference on Supercomputing
Improving writeback efficiency with decoupled last-write prediction
Proceedings of the 39th Annual International Symposium on Computer Architecture
HaLock: hardware-assisted lock contention detection in multithreaded applications
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Timing effects of DDR memory systems in hard real-time multicore architectures: Issues and solutions
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
High-performance and low-energy buffer mapping method for multiprocessor DSP systems
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Addressing End-to-End Memory Access Latency in NoC-Based Multicores
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Navigating big data with high-throughput, energy-efficient data partitioning
Proceedings of the 40th Annual International Symposium on Computer Architecture
Pragmatic integration of an SRAM row cache in heterogeneous 3-D DRAM architecture using TSV
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Effect of page frame allocation pattern on bank conflicts in multi-core systems
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Reducing DRAM row activations with eager read/write clustering
ACM Transactions on Architecture and Code Optimization (TACO)
Direct distributed memory access for CMPs
Journal of Parallel and Distributed Computing
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
This paper analyzes memory access scheduling and virtual channels as mechanisms to reduce the latency of main memory accesses by the CPU and peripherals in web servers. Despite the address filtering effects of the CPU's cache hierarchy, there is significant locality and bank parallelism in the DRAM access stream of a web server, which includes traffic from the operating system, application, and peripherals. However, a sequential memory controller leaves much of this locality and parallelism unexploited, as serialization and bank conflicts affect the realizable latency. Aggressive scheduling within the memory controller to exploit the available parallelism and locality can reduce the average read latency of the SDRAM. However, bank conflicts and the limited ability of the SDRAM's internal row buffers to act as a cache hinder further latency reduction. Virtual channel SDRAMovercomes these limitations by providing a set of channel buffers that can hold segments from rows of any internal SDRAM bank. This paper presents memory controller policies that can make effective use of these channel buffers to further reduce the average read latency of the SDRAM.