Developing a tool for memoizing functions in C++
ACM SIGPLAN Notices
Patterns in Java, volume 2
Fast Multiple-Precision Evaluation of Elementary Functions
Journal of the ACM (JACM)
Hardware Designs for Exactly Rounded Elementary Functions
IEEE Transactions on Computers
The Art of Assembly Language
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Fuzzy Memoization for Floating-Point Multimedia Applications
IEEE Transactions on Computers
Can software engineering solve the HPCS problem?
Proceedings of the second international workshop on Software engineering for high performance computing system applications
Tool Support for Inspecting the Code Quality of HPC Applications
SE-HPC '07 Proceedings of the 3rd International Workshop on Software Engineering for High Performance Computing Applications
Towards a framework for automated performance tuning
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A special-purpose compiler for look-up table and code generation for function evaluation
Proceedings of the Conference on Design, Automation and Test in Europe
Initial Results on Fuzzy Floating Point Computation for Multimedia Processors
IEEE Computer Architecture Letters
Fourth international workshop on multicore software engineering (IWMSE 2011)
Proceedings of the 33rd International Conference on Software Engineering
Tool support for software lookup table optimization
Scientific Programming
Hi-index | 0.00 |
Scientific programmers strive constantly to meet performance demands. Tuning is often done manually, despite the significant development time and effort required. One example is lookup table (LUT) optimization, a technique that is generally applied by hand due to a lack of methodology and tools. LUT methods reduce execution time by replacing computations with memory accesses to precomputed tables of results. LUT optimizations improve performance when the memory access is faster than the original computation, and the level of reuse is sufficient to amortize LUT initialization. Current practice requires programmers to inspect program source to identify candidate expressions, then develop specific LUT code for each optimization. Measurement of LUT accuracy is usually ad hoc, and the interaction with multicore parallelization has not been explored. In this paper we present Mesa, a standalone tool that implements error analysis and code generation to improve the process of LUT optimization. We evaluate Mesa on a multicore system using a molecular biology application and other scientific expressions. Our LUT optimizations realize a performance improvement of 5X for the application and up to 45X for the expressions, while tightly controlling error. We also show that the serial optimization is just as effective on a parallel version of the application. Our research provides a methodology and tool for incorporating LUT optimizations into existing scientific code