The C programming language
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Points-to analysis in almost linear time
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Memory systems and pipelined processors
Memory systems and pipelined processors
Memory data organization for improved cache performance in embedded processor applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Automatic and efficient evaluation of memory hierarchies for embedded systems
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
The hardness of cache conscious data placement
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Design space optimization of embedded memory systems via data remapping
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Cover story: the future of the microprocessor business
IEEE Spectrum - The future of the microprocessor business
PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators
Journal of VLSI Signal Processing Systems
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays
IEEE Transactions on Parallel and Distributed Systems
Advanced Data Layout Optimization for Multimedia Applications
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Improving Cache Behavior of Dynamically Allocated Data Structures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
The Memory Bandwidth Bottleneck and its Amelioration by a Compiler
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Automatic Architectural Synthesis of VLIW and EPIC Processors
Proceedings of the 12th international symposium on System synthesis
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Practical Structure Layout Optimization and Advice
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic memory optimization using pool allocation and prefetching
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Multi-compilation: capturing interactions among concurrently-executing applications
Proceedings of the 3rd conference on Computing frontiers
Restructuring field layouts for embedded memory systems
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Online performance auditing: using hot optimizations without getting burned
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Data trace cache: an application specific cache architecture
MEDEA '05 Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
Whole-program optimization of global variable layout
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Offline compression for on-chip ram
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Forma: A framework for safe automatic array reshaping
ACM Transactions on Programming Languages and Systems (TOPLAS)
MPADS: memory-pooling-assisted data splitting
Proceedings of the 7th international symposium on Memory management
Abstracting access patterns of dynamic memory using regular expressions
ACM Transactions on Architecture and Code Optimization (TACO)
Eliminating the call stack to save RAM
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Composition-based Cache simulation for structure reorganization
Journal of Systems Architecture: the EUROMICRO Journal
Layout transformations for heap objects using static access patterns
CC'07 Proceedings of the 16th international conference on Compiler construction
Customized placement for high performance embedded processor caches
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
On-the-fly structure splitting for heap objects
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A framework for compiler driven design space exploration for embedded system customization
ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
Trimaran: an infrastructure for research in instruction-level parallelism
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Trace-Based data layout optimizations for multi-core processors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Hi-index | 0.00 |
In this article, we present a novel linear time algorithm for data remapping, that is, (i) lightweight; (ii) fully automated; and (iii) applicable in the context of pointer-centric programming languages with dynamic memory allocation support. All previous work in this area lacks one or more of these features. We proceed to demonstrate a novel application of this algorithm as a key step in optimizing the design of an embedded memory system. Specifically, we show that by virtue of locality enhancements via data remapping, we may reduce the memory subsystem needs of an application by 50%, and hence concomitantly reduce the associated costs in terms of size, power, and dollar-investment (61%). Such a reduction overcomes key hurdles in designing high-performance embedded computing solutions. Namely, memory subsystems are very desirable from a performance standpoint, but their costs have often limited their use in embedded systems. Thus, our innovative approach offers the intriguing possibility of compilers playing a significant role in exploring and optimizing the design space of a memory subsystem for an embedded design. To this end and in order to properly leverage the improvements afforded by a compiler optimization, we identify a range of measures for quantifying the cost-impact of popular notions of locality, prefetching, regularity of memory access, and others. The proposed methodology will become increasingly important, especially as the needs for application specific embedded architectures become prevalent. In addition, we demonstrate the wide applicability of data remapping using several existing microprocessors, such as the Pentium and UltraSparc. Namely, we show that remapping can achieve a performance improvement of 20% on the average. Similarly, for a parametric research HPL-PD microprocessor, which characterizes the new Itanium machines, we achieve a performance improvement of 28% on average. All of our results are achieved using applications from the DIS, Olden and SPEC2000 suites of integer and floating point benchmarks.