Accelerating multi-media processing by implementing memoing in multiplication and division units

Authors:
Daniel Citron;Dror Feitelson;Larry Rudolph
Affiliations:
Department of Computer Science, The Hebrew University of Jerusalem, 91904 Jerusalem, Israel;Department of Computer Science, The Hebrew University of Jerusalem, 91904 Jerusalem, Israel;Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, MA
Venue:
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Year:
1998

Citing 12
Cited 14

Structure and interpretation of computer programs

Structure and interpretation of computer programs
Fundamentals of digital image processing

Fundamentals of digital image processing
The definition of Standard ML

The definition of Standard ML
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
A simulation based study of TLB performance

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Register traffic analysis for streamlining inter-operation communication in fine-grain parallel processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Cache write policies and performance

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dynamic instruction reuse

Proceedings of the 24th annual international symposium on Computer architecture
A Benchmark Tutorial

IEEE Micro
An Area/Performance Comparison of Subtractive and Multiplicative Divide/Square Root Implementations

ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Shade: A Fast Instruction Set Simulator for Execution Profiling

Shade: A Fast Instruction Set Simulator for Execution Profiling

Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
On the potential of tolerant region reuse for multimedia applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Three extensions to register integration

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences

Proceedings of the 30th annual international symposium on Computer architecture
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse

IEEE Transactions on Computers
On the effectiveness of flow aggregation in improving instruction reuse in network processing applications

International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Quality-Driven Proactive Computation Elimination for Power-Aware Multimedia Processing

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Fuzzy Memoization for Floating-Point Multimedia Applications

IEEE Transactions on Computers
Partial resolution for redundant operation table

Microprocessors & Microsystems
Early detection and bypassing of trivial operations to improve energy efficiency of processors

Microprocessors & Microsystems
Window memoization: an efficient hardware architecture for high-performance image processing

Journal of Real-Time Image Processing
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
“Look it up" or "do the math": an energy, area, and timing analysis of instruction reuse and memoization

PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Software data-triggered threads

Proceedings of the ACM international conference on Object oriented programming systems languages and applications

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root …) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations and using the output if the input is encountered again. This technique is especially suitable for Multi-Media (MM) processing. In MM applications the local entropy of the data tends to be low which results in repeated operations on the same datum.The inputs and outputs of assembly level operations are stored in cache-like lookup tables and accessed in parallel to the conventional computation. A successful lookup gives the result of a multi-cycle computation in a single cycle, and a failed lookup doesn't necessitate a penalty in computation time.Results of simulations have shown that on the average, for a modestly sized memo-table, about 40% of the floating point multiplications and 50% of the floating point divisions, in Multi-Media applications, can be avoided by using the values within the memo-table, leading to an average computational speedup of more than 20%.