Structure and interpretation of computer programs
Structure and interpretation of computer programs
Fundamentals of digital image processing
Fundamentals of digital image processing
The definition of Standard ML
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
A simulation based study of TLB performance
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Cache write policies and performance
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
A comparison of dynamic branch predictors that use two levels of branch history
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Proceedings of the 24th annual international symposium on Computer architecture
IEEE Micro
An Area/Performance Comparison of Subtractive and Multiplicative Divide/Square Root Implementations
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Shade: A Fast Instruction Set Simulator for Execution Profiling
Shade: A Fast Instruction Set Simulator for Execution Profiling
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
On the potential of tolerant region reuse for multimedia applications
ICS '01 Proceedings of the 15th international conference on Supercomputing
Three extensions to register integration
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
MisSPECulation: partial and misleading use of SPEC CPU2000 in computer architecture conferences
Proceedings of the 30th annual international symposium on Computer architecture
Balancing Reuse Opportunities and Performance Gains with Subblock Value Reuse
IEEE Transactions on Computers
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Quality-Driven Proactive Computation Elimination for Power-Aware Multimedia Processing
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Fuzzy Memoization for Floating-Point Multimedia Applications
IEEE Transactions on Computers
Partial resolution for redundant operation table
Microprocessors & Microsystems
Early detection and bypassing of trivial operations to improve energy efficiency of processors
Microprocessors & Microsystems
Window memoization: an efficient hardware architecture for high-performance image processing
Journal of Real-Time Image Processing
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Software data-triggered threads
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Hi-index | 0.01 |
This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root …) computations in a single cycle. The technique is based on the notion of memoing: saving the input and output of previous calculations and using the output if the input is encountered again. This technique is especially suitable for Multi-Media (MM) processing. In MM applications the local entropy of the data tends to be low which results in repeated operations on the same datum.The inputs and outputs of assembly level operations are stored in cache-like lookup tables and accessed in parallel to the conventional computation. A successful lookup gives the result of a multi-cycle computation in a single cycle, and a failed lookup doesn't necessitate a penalty in computation time.Results of simulations have shown that on the average, for a modestly sized memo-table, about 40% of the floating point multiplications and 50% of the floating point divisions, in Multi-Media applications, can be avoided by using the values within the memo-table, leading to an average computational speedup of more than 20%.