Missing the memory wall: the case for processor/memory integration
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A bandwidth-efficient architecture for media processing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Performance of image and video processing with general-purpose processors and media ISA extensions
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
IEEE Micro
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A Characterization of Processor Performance in the vax-11/780
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Command Vector Memory Systems: High Performance at Low Cost
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Polygon rendering on a stream architecture
HWWS '00 Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Access pattern based local memory customization for low power embedded systems
Proceedings of the conference on Design, automation and test in Europe
New directions in compiler technology for embedded systems (embedded tutorial)
Proceedings of the 2001 Asia and South Pacific Design Automation Conference
A study of memory system performance of multimedia applications
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Designing a Modern Memory Hierarchy with Hardware Prefetching
IEEE Transactions on Computers
Comparing Reyes and OpenGL on a stream architecture
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Imagine: Media Processing with Streams
IEEE Micro
Analysis of performance bottlenecks in multithreaded multiprocessor systems
Fundamenta Informaticae - Application of concurrency to system design
A 64Mbit Mesochronous Hybrid Wave Pipelined Multibank DRAM Macro
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Aggressive Memory-Aware Compilation
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
A network flow approach to memory bandwidth utilization in embedded DSP core processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Xtream-Fit: an energy-delay efficient data memory subsystem for embedded media processing
Proceedings of the 40th annual Design Automation Conference
Exploring the VLSI Scalability of Stream Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Efficient use of memory bandwidth to improve network processor throughput
Proceedings of the 30th annual international symposium on Computer architecture
An Analysis of Cache Performance of Multimedia Applications
IEEE Transactions on Computers
Adaptive History-Based Memory Schedulers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory Controller Optimizations for Web Servers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Micro-architecture techniques in the intel® E8870 scalable memory controller
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A study of performance impact of memory controller features in multi-processor server environment
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Traffic shaping for an FPGA based SDRAM controller with complex QoS requirements
Proceedings of the 42nd annual Design Automation Conference
Database hash-join algorithms on multithreaded computer architectures
Proceedings of the 3rd conference on Computing frontiers
Exploiting locality to ameliorate packet queue contention and serialization
Proceedings of the 3rd conference on Computing frontiers
The bit-reversal SDRAM address mapping
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
The design space of data-parallel memory systems
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Virtually Pipelined Network Memory
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS
ACM SIGARCH Computer Architecture News
A 64-bit stream processor architecture for scientific applications
Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A high-end real-time digital film processing reconfigurable platform
EURASIP Journal on Embedded Systems
Predator: a predictable SDRAM memory controller
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Memory scheduling for modern microprocessors
ACM Transactions on Computer Systems (TOCS)
Improving SDRAM access energy efficiency for low-power embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Memory performance attacks: denial of memory service in multi-core systems
SS'07 Proceedings of 16th USENIX Security Symposium on USENIX Security Symposium
Exploiting program cyclic behavior to reduce memory latency in embedded processors
Proceedings of the 2008 ACM symposium on Applied computing
A small data cache for multimedia-oriented embedded systems
Journal of Systems Architecture: the EUROMICRO Journal
Optimizing thread throughput for multithreaded workloads on memory constrained CMPs
Proceedings of the 5th conference on Computing frontiers
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
3D-Stacked Memory Architectures for Multi-core Processors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Designing packet buffers for router linecards
IEEE/ACM Transactions on Networking (TON)
Distributed order scheduling and its application to multi-core dram controllers
Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
Memory organization with multi-pattern parallel accesses
Proceedings of the conference on Design, automation and test in Europe
Power Aware External Bus Arbitration for System-on-a-Chip Embedded Systems
Transactions on High-Performance Embedded Architectures and Compilers I
Efficient Memory Utilization for High-Speed FPGA-Based Hardware Emulators with SDRAMs
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Prefetch-Aware DRAM Controllers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A light-weight fairness mechanism for chip multiprocessor memory systems
Proceedings of the 6th ACM conference on Computing frontiers
Dynamic warp formation: Efficient MIMD control flow on SIMD graphics hardware
ACM Transactions on Architecture and Code Optimization (TACO)
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices
Proceedings of the 36th annual international symposium on Computer architecture
Buffer management for colored packets with deadlines
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
High-Performance Buffer Mapping to Exploit DRAM Concurrency in Multiprocessor DSP Systems
RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Application development with the FlexWAFE real-time stream processing architecture for FPGAs
ACM Transactions on Embedded Computing Systems (TECS)
Prediction in Dynamic SDRAM Controller Policies
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Exploiting Locality on the Cell/B.E. through Bypassing
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
High-bandwidth network memory system through virtual pipelines
IEEE/ACM Transactions on Networking (TON)
An SDRAM-aware router for Networks-on-Chip
Proceedings of the 46th Annual Design Automation Conference
Future scaling of processor-memory interfaces
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Complexity effective memory access scheduling for many-core accelerator architectures
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Coordinated control of multiple prefetchers in multi-core systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Improving memory bank-level parallelism in the presence of prefetching
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Implementing and optimizing a data-intensive hydrodynamics application on the stream processor
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
The virtual write queue: coordinating DRAM and last-level cache policies
Proceedings of the 37th annual international symposium on Computer architecture
A Low-Latency and Memory-Efficient On-chip Network
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
A Network Congestion-Aware Memory Controller
NOCS '10 Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
An SDRAM-aware router for networks-on-chip
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems - Special section on the ACM IEEE international conference on formal methods and models for codesign (MEMOCODE) 2009
Software-hardware cooperative DRAM bank partitioning for chip multiprocessors
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Elastic Refresh: Techniques to Mitigate Refresh Penalties in High Density Memory
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A platform for high level synthesis of memory-intensive image processing algorithms
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Making the Best of Temporal Locality: Just-in-Time Renaming and Lazy Write-Back on the Cell/B.E
International Journal of High Performance Computing Applications
FPGA implementation & comparison of current trends in memory scheduler for multimedia application
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Application specific memory access, reuse and reordering for SDRAM
ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Scalable QoS-aware memory controller for high-bandwidth packet memory
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Performance limitations of block-multithreaded distributed-memory systems
Winter Simulation Conference
Page placement in hybrid memory systems
Proceedings of the international conference on Supercomputing
Poster: revisiting virtual channel memory for performance and fairness on multi-core architecture
Proceedings of the international conference on Supercomputing
Prefetch-aware shared resource management for multi-core systems
Proceedings of the 38th annual international symposium on Computer architecture
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput
Proceedings of the 38th annual international symposium on Computer architecture
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Elastic pipeline: addressing GPU on-chip shared memory bank conflicts
Proceedings of the 8th ACM International Conference on Computing Frontiers
T&I engine: traversal and intersection engine for hardware accelerated ray tracing
Proceedings of the 2011 SIGGRAPH Asia Conference
Power management of hybrid DRAM/PRAM-based main memory
Proceedings of the 48th Design Automation Conference
Matching cache access behavior and bit error pattern for high performance low Vcc L1 cache
Proceedings of the 48th Design Automation Conference
Bandwidth constrained coordinated HW/SW prefetching for multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A bursty multi-port memory controller with quality-of-service guarantees
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploring the limits of GPGPU scheduling in control flow bound applications
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Utilizing RF-I and intelligent scheduling for better throughput/watt in a mobile GPU memory system
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Memory access schedule minimization for embedded systems
Journal of Systems Architecture: the EUROMICRO Journal
Power aware external bus arbitration for system-on-a-chip embedded systems
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Improving System Energy Efficiency with Memory Rank Subsetting
ACM Transactions on Architecture and Code Optimization (TACO)
Hierarchical memory scheduling for multimedia MPSoCs
Proceedings of the International Conference on Computer-Aided Design
A fair thread-aware memory scheduling algorithm for chip multiprocessor
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Optimizing SDRAM bandwidth for custom FPGA loop accelerators
Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A resistive TCAM accelerator for data-intensive computing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A register-file approach for row buffer caches in die-stacked DRAMs
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Parallel application memory scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing memory interference in multicore systems via application-aware memory channel partitioning
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Computer Systems (TOCS)
A high performance adaptive miss handling architecture for chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
DIEF: an accurate interference feedback mechanism for chip multiprocessor memory systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Deterministic execution model on COTS hardware
ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC
Proceedings of the 49th Annual Design Automation Conference
Proceedings of the 49th Annual Design Automation Conference
Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU
Proceedings of the 49th Annual Design Automation Conference
Write performance improvement by hiding R drift latency in phase-change RAM
Proceedings of the 49th Annual Design Automation Conference
Rank idle time prediction driven last-level cache writeback
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Trace-driven simulation of memory system scheduling in multithread application
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities
Proceedings of the 26th ACM international conference on Supercomputing
Unified memory optimizing architecture: memory subsystem control with a unified predictor
Proceedings of the 26th ACM international conference on Supercomputing
DRAM power-aware rank scheduling
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
RAIDR: Retention-Aware Intelligent DRAM Refresh
Proceedings of the 39th Annual International Symposium on Computer Architecture
PARDIS: a programmable memory controller for the DDRx interfacing standards
Proceedings of the 39th Annual International Symposium on Computer Architecture
BOOM: enabling mobile memory based low-power server DIMMs
Proceedings of the 39th Annual International Symposium on Computer Architecture
Improving writeback efficiency with decoupled last-write prediction
Proceedings of the 39th Annual International Symposium on Computer Architecture
A case for exploiting subarray-level parallelism (SALP) in DRAM
Proceedings of the 39th Annual International Symposium on Computer Architecture
Physically addressed queueing (PAQ): improving parallelism in solid state disks
Proceedings of the 39th Annual International Symposium on Computer Architecture
Staged memory scheduling: achieving high performance and scalability in heterogeneous systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
The dynamic granularity memory system
Proceedings of the 39th Annual International Symposium on Computer Architecture
Optimizing datacenter power with memory system levers for guaranteed quality-of-service
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A software memory partition approach for eliminating bank-level interference in multicore systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Analysis of Performance Bottlenecks in Multithreaded Multiprocessor Systems
Fundamenta Informaticae - Application of Concurrency to System Design
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
A distributed interleaving scheme for efficient access to WideIO DRAM memory
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Implementation and optimization of sparse matrix-vector multiplication on imagine stream processor
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Analytical synthesis of bandwidth-efficient SDRAM address generators
Microprocessors & Microsystems
Timing effects of DDR memory systems in hard real-time multicore architectures: Issues and solutions
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
High-performance and low-energy buffer mapping method for multiprocessor DSP systems
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Predicting Performance Impact of DVFS for Realistic Memory Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Addressing End-to-End Memory Access Latency in NoC-Based Multicores
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Conservative row activation to improve memory power efficiency
Proceedings of the 27th international ACM conference on International conference on supercomputing
CMP off-chip bandwidth scheduling guided by instruction criticality
Proceedings of the 27th international ACM conference on International conference on supercomputing
Prefetching and cache management using task lifetimes
Proceedings of the 27th international ACM conference on International conference on supercomputing
Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems
Proceedings of the 40th Annual International Symposium on Computer Architecture
Improving memory scheduling via processor-side load criticality information
Proceedings of the 40th Annual International Symposium on Computer Architecture
AC-DIMM: associative computing with STT-MRAM
Proceedings of the 40th Annual International Symposium on Computer Architecture
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Reducing memory access latency with asymmetric DRAM bank organizations
Proceedings of the 40th Annual International Symposium on Computer Architecture
A network congestion-aware memory subsystem for manycore
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on Wireless Health Systems, On-Chip and Off-Chip Network Architectures
Pragmatic integration of an SRAM row cache in heterogeneous 3-D DRAM architecture using TSV
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SGRT: a mobile GPU architecture for real-time ray tracing
Proceedings of the 5th High-Performance Graphics Conference
Distributed fair DRAM scheduling in network-on-chips architecture
Journal of Systems Architecture: the EUROMICRO Journal
Designing on-chip networks for throughput accelerators
ACM Transactions on Architecture and Code Optimization (TACO)
Effect of page frame allocation pattern on bank conflicts in multi-core systems
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Memory-centric system interconnect design with hybrid memory cubes
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A programmable memory controller for the DDRx interfacing standards
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing DRAM row activations with eager read/write clustering
ACM Transactions on Architecture and Code Optimization (TACO)
Simultaneously optimizing DRAM cache hit latency and miss rate via novel set mapping policies
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Dynamic bandwidth scaling for embedded DSPs with 3D-stacked DRAM and wide I/Os
Proceedings of the International Conference on Computer-Aided Design
Direct distributed memory access for CMPs
Journal of Parallel and Distributed Computing
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications
Proceedings of Workshop on General Purpose Processing Using GPUs
Endurance-aware cache line management for non-volatile caches
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
The bandwidth and latency of a memory system are strongly dependent on the manner in which accesses interact with the “3-D” structure of banks, rows, and columns characteristic of contemporary DRAM chips. There is nearly an order of magnitude difference in bandwidth between successive references to different columns within a row and different rows within a bank. This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Conservative reordering, in which the first ready reference in a sequence is performed, improves bandwidth by 40% for traces from five media benchmarks. Aggressive reordering, in which operations are scheduled to optimize memory bandwidth, improves bandwidth by 93% for the same set of applications. Memory access scheduling is particularly important for media processors where it enables the processor to make the most efficient use of scarce memory bandwidth.