Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Procedure merging with instruction caches
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
COSYN: hardware-software co-synthesis of embedded systems
DAC '97 Proceedings of the 34th annual Design Automation Conference
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Code placement techniques for cache miss rate reduction
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Power optimization of variable voltage core-based systems
DAC '98 Proceedings of the 35th annual Design Automation Conference
A framework for estimation and minimizing energy dissipation of embedded HW/SW systems
DAC '98 Proceedings of the 35th annual Design Automation Conference
Hardware/software co-synthesis with memory hierarchies
Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
Cycle-accurate simulation of energy consumption in embedded systems
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Performance estimation of embedded software with instruction cache modeling
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Efficient power co-estimation techniques for system-on-chip design
DATE '00 Proceedings of the conference on Design, automation and test in Europe
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Code placement in hardware/software co-synthesis to improve performance and reduce cost
Proceedings of the conference on Design, automation and test in Europe
A hybrid approach for core-based system-level power modeling
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Improving cache Performance Through Tiling and Data Alignment
IRREGULAR '97 Proceedings of the 4th International Symposium on Solving Irregularly Structured Problems in Parallel
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Behavioral Array Mapping into Multiport Memories Targeting Low Power
VLSID '97 Proceedings of the Tenth International Conference on VLSI Design: VLSI in Multimedia Applications
A data alignment technique for improving cache performance
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Memory Organization for Improved Data Cache Performance in Embedded Processors
ISSS '96 Proceedings of the 9th international symposium on System synthesis
Code Transformations for Low Power Caching in Embedded Multimedia Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Application-driven synthesis of memory-intensive systems-on-chip
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hardware/software co-synthesis with memory hierarchies
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
In this paper, we present a novel and fast constructive technique that relocates the instruction code in such a manner into the main memory that the cache is utilized more efficiently. The technique is applied as a preprocessing step, i.e., before the code is executed. Our technique is applicable in embedded systems where the number and characteristics of tasks running on the system is known a priori. The technique does not impose any computational overhead to the system. As a result of applying our technique to a variety of real-world applications we observed through simulation a significant drop of cache misses. Furthermore, the energy consumption of the whole system (CPU, caches, buses, main memory) is reduced by up to 65%. These benefits could be achieved by a slightly increased main memory size of about 13% on average.