Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
DSP design tool requirements for embedded systems: a telecommunications industrial perspective
Journal of VLSI Signal Processing Systems - Special issue on design environments for DSP
Instruction set definition and instruction selection for ASIPs
ISSS '94 Proceedings of the 7th international symposium on High-level synthesis
A task-level hierarchical memory model for system synthesis of multiprocessors
DAC '97 Proceedings of the 34th annual Design Automation Conference
Code placement techniques for cache miss rate reduction
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Memory Issues in Embedded Systems-on-Chip: Optimizations and Exploration
Vector microprocessors
Extending the reach of microprocessors: column and curious caching
Extending the reach of microprocessors: column and curious caching
Input space adaptive design: a high-level methodology for energy and performance optimization
Proceedings of the 38th annual Design Automation Conference
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Software-assisted cache replacement mechanisms for embedded systems
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Xtream-Fit: an energy-delay efficient data memory subsystem for embedded media processing
Proceedings of the 40th annual Design Automation Conference
ACM Transactions on Embedded Computing Systems (TECS)
SystemC
Cluster miss prediction for instruction caches in embedded networking applications
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Cluster miss prediction with prefetch on miss for embedded CPU instruction caches
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Input space adaptive design: a high-level methodology for optimizing energy and performance
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Bandwidth Management with a Reconfigurable Data Cache
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Compiler-managed partitioned data caches for low power
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A dynamically reconfigurable cache for multithreaded processors
Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Efficient computation of sum-products on GPUs through software-managed cache
Proceedings of the 22nd annual international conference on Supercomputing
A data centered approach for cache partitioning in embedded real-time database system
WSEAS Transactions on Computers
Towards practical page coloring-based multicore cache management
Proceedings of the 4th ACM European conference on Computer systems
Compiler-Based Performance Evaluation of an SIMD Processor with a Multi-Bank Memory Unit
Journal of Signal Processing Systems
Instruction cache locking inside a binary rewriter
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Multiprocessor System-on-Chip designs with active memory processors for higher memory efficiency
Proceedings of the 46th Annual Design Automation Conference
Cache partitioning for energy-efficient and interference-free embedded multitasking
ACM Transactions on Embedded Computing Systems (TECS)
Customized placement for high performance embedded processor caches
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Vision for liquid architecture
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Cost-effectively offering private buffers in SoCs and CMPs
Proceedings of the international conference on Supercomputing
Vantage: scalable and efficient fine-grain cache partitioning
Proceedings of the 38th annual international symposium on Computer architecture
An energy-efficient adaptive hybrid cache
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms
Proceedings of the 48th Design Automation Conference
HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Dynamically reconfigurable hybrid cache: an energy-efficient last-level cache design
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Jigsaw: scalable software-defined caches
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Ubik: efficient cache sharing with strict qos for latency-critical workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
We propose a way to improve the performance of embedded processors running data-intensive applications by allowing software to allocate on-chip memory on an application-specific basis. On-chip memory in the form of cache can be made to act like scratch-pad memory via a novel hardware mechanism, which we call column caching. Column caching enables dynamic cache partitioning in software, by mapping data regions to a specified sets of cache “columns” or “ways.” When a region of memory is exclusively mapped to an equivalent sized partition of cache, column caching provides the same functionality and predictability as a dedicated scratchpad memory for time-critical parts of a real-time application. The ratio between scratchpad size and cache size can be easily and quickly varied for each application, or each task within an application. Thus, software has much finer software control of on-chip memory, providing the ability to dynamically tradeoff performance for on-chip memory.