A Prototypical Self-Optimizing Package for Parallel Implementation of Fast Signal Transforms
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Tiling, Block Data Layout, and Memory Hierarchy Performance
IEEE Transactions on Parallel and Distributed Systems
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
Algorithmic techniques for memory energy reduction
WEA'03 Proceedings of the 2nd international conference on Experimental and efficient algorithms
Hi-index | 0.00 |
The Walsh-Hadamard Transform (WHT) is an important algorithm in signal processing because of its simplicity. However, in computing large size WHT, non-unit stride access results in poor cache performance leading to severe degradation in performance. This poor cache performance is also a critical problem in achieving high performance in other large size signal transforms. We develop a cache friendly technique that improves the performance of large size WHT. In our approach, data reorganization is performed between computation stages to reduce cache pollution. Furthermore, we develop an efficient search algorithm to determine the optimal factorization tree based upon problem size and stride access in the decomposition. Experimental results show that our approach achieves up to 180% performance improvement over the state of the art package on Alpha 21264 and MIPS R10000. In addition, the proposed optimization is applicable to other signal transforms and is portable across various platforms.