Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Instruction level power analysis and optimization of software
Journal of VLSI Signal Processing Systems - Special issue on technologies for wireless computing
Fast and extensive system-level memory exploration for ATM applications
ISSS '97 Proceedings of the 10th international symposium on System synthesis
Formalized methodology for data reuse exploration in hierarchical memory mappings
ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Advanced compiler design and implementation
Advanced compiler design and implementation
High-level address optimization and synthesis techniques for data-transfer-intensive applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Journal of VLSI Signal Processing Systems - Special issue on the 1997 IEEE workshop on signal processing systems (SiPS): design and implementation
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
System-Level Memory Management for Weakly Parallel Image Processing
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
IEEE Transactions on Circuits and Systems for Video Technology
Hi-index | 0.00 |
Real-time multi-media applications need large processing power and yet require a low-power implementation in an embedded context. For programmable parallel processors, this poses new challenges for optimizing a given application for high-performance and low-power. In this paper, we present a case study of applying our low-power oriented data transfer and storage exploration methodology and coupling it with a state-of-the-art performance optimizing and parallelizing compiler. Experiments on two real-life applications show that this combined approach heavily reduces the memory accesses and bus-loading and hence power. At the same time a significant reduction in the total execution time is obtained. Decomposing the detailed parallelization and data transfer and storage exploration issues into two different stages is required to obtain the important benefits of both the stages without exploding the complexity of solving all the issues simultaneously. This will be demonstrated by the experimental results.