Handbook of Applied Cryptography
Handbook of Applied Cryptography
On the Practical and Security Issues of Batch Content Distribution Via Network Coding
ICNP '06 Proceedings of the Proceedings of the 2006 IEEE International Conference on Network Protocols
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Massive parallel LDPC decoding on GPU
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
StoreGPU: exploiting graphics processing units to accelerate distributed storage systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Practical Random Linear Network Coding on GPUs
NETWORKING '09 Proceedings of the 8th International IFIP-TC 6 Networking Conference
Fast exponentiation with precomputation
EUROCRYPT'92 Proceedings of the 11th annual international conference on Theory and application of cryptographic techniques
Speeding up homomorpic hashing using GPUs
ICC'09 Proceedings of the 2009 IEEE international conference on Communications
Random linear network coding for peer-to-peer applications
IEEE Network: The Magazine of Global Internetworking
R2: Random Push with Random Network Coding in Live Peer-to-Peer Streaming
IEEE Journal on Selected Areas in Communications
Concurrency and Computation: Practice & Experience
Hi-index | 0.00 |
Homomorphic hash functions play a key role in securing distributed systems that use coding techniques such as erasure coding and network coding. The computational complexity of homomorphic hash functions remains a main challenge. In this paper, we present a massively parallel solution, named Tsunami, by exploiting the widely available many-core graphic processing units (GPUs). Tsunami includes the following optimization techniques to achieve the highest ever hashing throughput: (1) using Montgomery multiplication and precomputation to speed up modular exponentiations; (2) using a clean implementation of Montgomery multiplication in order to decrease the demand of registers and shared memory and increase the utilization ratio of GPU processing cores; (3) using our own assembly code to implement the 32-bit integer multiplication, which outperforms the assembly codes generated by the native compiler by 20%; and (4) exploiting memory alignment and constant memory on GPUs to improve the efficiency of memory access. Integrating the above techniques, our Tsunami achieves a significant improvement over existing results. Specifically, the hashing throughput achieved by Tsunami on a GTX295 GPU (NVIDIA, Santa Clara, CA, US) is about 33 times that of the existing solution on a quad-core CPU. We also show that the hashing throughput grows almost linearly with the number of GPU cores. Copyright © 2011 John Wiley & Sons, Ltd.