Vector access performance in parallel memories using skewed storage scheme
IEEE Transactions on Computers
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Code generation for streaming: an access/execute mechanism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparative evaluation of latency reducing and tolerating techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Increasing the number of strides for conflict-free vector access
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A comparison of three current superscalar designs
ACM SIGARCH Computer Architecture News
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The Chinese remainder theorem and the prime memory system
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Increasing memory bandwidth for vector computations
Proceedings of the international conference on Programming languages and system architectures
Access ordering and effective memory bandwidth
Access ordering and effective memory bandwidth
Blocking Linear Algebra Codes for Memory Hierarchies
Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing
Performance of the IPSC/860 Node Architecture
Performance of the IPSC/860 Node Architecture
Hardware Support for Dynamic Access Ordering: Performance of Some Design Options
Hardware Support for Dynamic Access Ordering: Performance of Some Design Options
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
The Organization and Use of Parallel Memories
IEEE Transactions on Computers
Design and evaluation of dynamic access ordering hardware
ICS '96 Proceedings of the 10th international conference on Supercomputing
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Hardware-only stream prefetching and dynamic access ordering
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 27th annual international symposium on Computer architecture
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
IEEE Transactions on Computers
High-Performance DRAMs in Workstation Environments
IEEE Transactions on Computers
Designing a Modern Memory Hierarchy with Hardware Prefetching
IEEE Transactions on Computers
Memory System Support for Irregular Applications
LCR '98 Selected Papers from the 4th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
DRAM-Page Based Prediction and Prefetching
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Proceedings of the 42nd annual Design Automation Conference
Impulse: Memory system support for scientific applications
Scientific Programming
A memory-conscious code parallelization scheme
Proceedings of the 44th annual Design Automation Conference
Designing packet buffers for router linecards
IEEE/ACM Transactions on Networking (TON)
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices
Proceedings of the 36th annual international symposium on Computer architecture
Physically addressed queueing (PAQ): improving parallelism in solid state disks
Proceedings of the 39th Annual International Symposium on Computer Architecture
Hi-index | 0.01 |
As processor speeds increase relative to memory speeds, memory bandwidth is rapidly becoming the limiting performance, factor for many applications. Several approaches to bridging this performance gap have been suggested. This paper examines one approach, access ordering, and pushes its limits to determine bounds on memory performance. We present several access-ordering schemes, and compare their performance, developing analytic models and partially validating these with benchmark timings on the Intel i860XR.