Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Conflict-Free Vector Access Using a Dynamic Storage Scheme
IEEE Transactions on Computers
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
On high-bandwidth data cache design for multi-issue processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
Synthesis of custom interleaved memory systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Computers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Storage Management Programmable Process
Storage Management Programmable Process
Conflict-Free Access for Streams in Multimodule Memories
IEEE Transactions on Computers
Block, Multistride Vector, and FFT Accesses in Parallel Memory Systems
IEEE Transactions on Parallel and Distributed Systems
A Framework for Parallelizing Load/Stores on Embedded Processors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Compiler-Directed Array Interleaving for Reducing Energy in Multi-Bank Memories
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Maps: a compiler-managed memory system for software-exposed architectures
Maps: a compiler-managed memory system for software-exposed architectures
Reducing Design Complexity of the Load/Store Queue
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study
Proceedings of the conference on Design, automation and test in Europe - Volume 2
Custom Data Layout for Memory Parallelism
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
On Design of Parallel Memory Access Schemes for Video Coding
Journal of VLSI Signal Processing Systems
Load-Store Queue Management: an Energy-Efficient Design Based on a State-Filtering Mechanism.
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Fire-and-Forget: Load/Store Scheduling with No Store Queue at All
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fast, Efficient and Predictable Memory Accesses: Optimization Algorithms for Memory Architecture Aware Compilation
Late-binding: enabling unordered load-store queues
Proceedings of the 34th annual international symposium on Computer architecture
Vector processing as an enabler for software-defined radio in handheld devices
EURASIP Journal on Applied Signal Processing
Memory scheduling for modern microprocessors
ACM Transactions on Computer Systems (TOCS)
Placement-and-routing-based register allocation for coarse-grained reconfigurable arrays
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
An automatic scratch pad memory management tool and MPEG-4 encoder case study
Proceedings of the 45th annual Design Automation Conference
A coarse-grained array based baseband processor for 100Mbps+ software defined radio
Proceedings of the conference on Design, automation and test in Europe
Edge-centric modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
SPR: an architecture-adaptive CGRA mapping tool
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Recurrence cycle aware modulo scheduling for coarse-grained reconfigurable architectures
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Energy-performance Exploration of a CGA-based SDR Processor
Journal of Signal Processing Systems
Hi-index | 0.00 |
This paper presents a memory organization for SDR inner modem baseband processors that focus on exploiting ILP. This memory organization uses power-efficient, single-ported, interleaved scratch-pad memory banks to provide enough bandwidth to a high-ILP processors. A system of queues in the memory interface is used to resolve bank conflicts among the single-ported banks, and to spread long bursts of conflicting accesses to the same bank over time. Bank address rotation is used to spread long bursts of conflicting accesses over multiple banks. All proposed techniques have been implemented in hardware, and are evaluated for a number of different wireless communication standards. For the 11a|n benchmarks, the overhead of stall cycles resulting from unresolved bank conflicts can be reduced to below 2% with the proposed organization. For 3GPP-LTE, the most demanding wireless standard we evaluated, the overhead is reduced to less than 0.13%. This is achieved with little energy and area overhead, and without any bank-aware compiler support.