On the effective bandwidth of interleaved memories in vector processor systems
IEEE Transactions on Computers
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
Pseudo-randomly interleaved memory
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Improving Cache Locality by a Combination of Loop and Data Transformations
IEEE Transactions on Computers - Special issue on cache memory and related problems
Minimizing Conflicts Between Vector Streams in Interleaved Memory Systems
IEEE Transactions on Computers
Reducing Interference Among Vector Accesses in Interleaved Memories
IEEE Transactions on Computers
Buffered Banks in Multiprocessor Systems
IEEE Transactions on Computers
Memory access reordering in vector processors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Flow Graph Balancing for Minimizing the Required Memory Bandwidth
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Cache miss equations: compiler analysis framework for tuning memory behavior
Cache miss equations: compiler analysis framework for tuning memory behavior
Proceedings of the 38th annual Design Automation Conference
EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
Fast, accurate design space exploration of embedded systems memory configurations
Proceedings of the 2007 ACM symposium on Applied computing
Hi-index | 0.00 |
Memory interleaving is a cost-efficient approach to increase bandwidth. Improving data access locality and reducing memory access conflicts are two important aspects to achieve high efficiency for interleaved memory. In this paper, we introduce a design framework that integrates these two optimizations, in order to find out minimal memory banks and channels required in the embedded system under performance restriction. Several important techniques, loop and data layout transformations for data access locality, extracting data streams, conflict cache miss reduction as well as data placement and optimally reordered access for interleaved memories, are incorporated in the design framework. Experiments show that our co-design method results in substantially less hardware requirement compared to the implementations without optimization.