Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Cache write policies and performance
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Storage assignment to decrease code size
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Performance estimation of embedded software with instruction cache modeling
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Memory bank and register allocation in software synthesis for ASIPs
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Code Generation for Embedded Processors
Code Generation for Embedded Processors
Optimal Code Placement of Embedded Software for Instruction Caches
EDTC '96 Proceedings of the 1996 European conference on Design and Test
Reducing Address Bus Transitions for Low Power Memory Mapping
EDTC '96 Proceedings of the 1996 European conference on Design and Test
Architectural exploration and optimization of local memory in embedded systems
ISSS '97 Proceedings of the 10th international symposium on System synthesis
Code placement techniques for cache miss rate reduction
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the 11th international symposium on System synthesis
Journal of VLSI Signal Processing Systems - Special issue on system level design
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Proceedings of the conference on Design, automation and test in Europe
Systematic data reuse exploration methodology for irregular access patterns
ISSS '00 Proceedings of the 13th international symposium on System synthesis
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors
IEEE Transactions on Computers
Advanced Data Layout Optimization for Multimedia Applications
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Framework for Loop Distribution on Limited On-Chip Memory Processors
CC '00 Proceedings of the 9th International Conference on Compiler Construction
Efficient Utilization of Scratch-Pad Memory in Embedded Processor Applications
EDTC '97 Proceedings of the 1997 European conference on Design and Test
Code Transformations for Low Power Caching in Embedded Multimedia Processors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
IEEE Transactions on Computers
Journal of VLSI Signal Processing Systems
Instruction code mapping for performance increase and energy reduction in embedded computer systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A novel instruction scratchpad memory optimization method based on concomitance metric
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
A cache-defect-aware code placement algorithm for improving the performance of processors
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Code and Data Placement for Embedded Processors with Scratchpad and Cache Memories
Journal of Signal Processing Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Power aware data type refinement for the HIPERLAN/2
MIV'05 Proceedings of the 5th WSEAS international conference on Multimedia, internet & video technologies
Hi-index | 0.01 |
Code generation for embedded processors creates opportunities for several performance optimizations not applicable for traditional compilers. We present techniques for improving data cache performance by organizing variables declared in embedded code into memory, using specific parameters of the data cache. Our approach clusters variables to minimize compulsory cache misses, and solves the memory assignment problem to minimize conflict cache misses. Our experiments demonstrate significant improvement in data cache performance (average 46\% in hit ratios) by the application of our memory organization technique using code kernels from DSP and other domains on the LSI Logic CW4001 embedded processor.