On the potential of tolerant region reuse for multimedia applications

Authors:
Carlos Álvarez;Jesús Corbal;Esther Salamí;Mateo Valero
Affiliations:
Departament d'Arquitectura de Computadors, UPC. Universitat Politecnica de Catalunya-Barcelona, Spain;Departament d'Arquitectura de Computadors, UPC. Universitat Politecnica de Catalunya-Barcelona, Spain;Departament d'Arquitectura de Computadors, UPC. Universitat Politecnica de Catalunya-Barcelona, Spain;Departament d'Arquitectura de Computadors, UPC. Universitat Politecnica de Catalunya-Barcelona, Spain
Venue:
ICS '01 Proceedings of the 15th international conference on Supercomputing
Year:
2001

Citing 13
Cited 1

ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A dynamic multithreading processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Accelerating multi-media processing by implementing memoing in multiplication and division units

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
MPEG-4: multimedia for our time

IEEE Spectrum
Compiler-directed dynamic computation reuse: rationale and initial results

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Hardware support for dynamic activation of compiler-directed computation reuse

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Exploiting Basic Block Value Locality with Block Reuse

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Trace-Level Reuse

ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Speculative Data-Driven Multithreading

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

Region-level approximate computation reuse for power reduction in multimedia applications

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design

Quantified Score

Hi-index	0.00

Visualization

Abstract

The recent years have shown an interesting evolution in the mid-end to low-end embedded domain. Portable systems are growing in importance as they improve in storage capacity and in interaction capabilities with general purpose systems. Furthermore, media processing is changing the view embedded processors are designed, keeping in mind the emergence of new application domains such as those for PDA systems or for the third generation of mobile digital phones (UMTS).The performance requirements of these new kind of devices are not those of the general-purpose domain, where traditionally the premium goal is the highest performance. Embedded systems must face ever increasing real time requirements as well as power consumption constraints. Under this special scenario, instruction/region reuse arises as a promising way of increasing the performance of media embedded processors and, at the same time, reducing the power consumption. Furthermore, media and signal processing applications are a suitable target for instruction/region reuse, given the large amount of redundancy found in media data working sets.In this paper we propose a novel region reuse mechanism that takes advantage of the tolerance of media algorithms to losses in the precision of computation. By identifying regions of code where an input data set is processed into an output data set, we can reuse computational instances using the result of previous ones with a similar input data set (hence the term tolerant reuse). We will show that conventional region reuse is barely able to provide more than a 8% in reduction of executed instructions (even with significantly big tables) in a typical JPEG encoder application. On the other hand, when applying the concept of tolerance, we are able to provide a reduction of more than 25% of the number of executed instructions with tables smaller than 1KB (with only small degradations in the quality of the output image), and up to a 40% of reduction (and no visually perceptible differences) with bigger tables .