Optimizing LZSS compression on GPGPUs

Authors:
Adnan Ozsoy;Martin Swany;Arun Chauhan
Affiliations:
-;-;-
Venue:
Future Generation Computer Systems
Year:
2014

Citing 10
Cited 0

Data compression via textual substitution

Journal of the ACM (JACM)
NVIDIA cuda software and gpu parallel computing architecture

Proceedings of the 6th international symposium on Memory management
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Programming Massively Parallel Processors: A Hands-on Approach

Programming Massively Parallel Processors: A Hands-on Approach
PacketShader: a GPU-accelerated software router

Proceedings of the ACM SIGCOMM 2010 conference
Parallel variable-length encoding on GPGPUs

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Database compression on graphics processors

Proceedings of the VLDB Endowment
Floating-point data compression at 75 Gb/s on a GPU

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
CULZSS: LZSS Lossless Data Compression on CUDA

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an algorithm and provide design improvements needed to port the serial Lempel-Ziv-Storer-Szymanski (LZSS), lossless data compression algorithm, to a parallelized version suitable for general purpose graphic processor units (GPGPU), specifically for NVIDIA's CUDA Framework. The two main stages of the algorithm, substring matching and encoding, are studied in detail to fit into the GPU architecture. We conducted detailed analysis of our performance results and compared them to serial and parallel CPU implementations of LZSS algorithm. We also benchmarked our algorithm in comparison with well known, widely used programs: GZIP and ZLIB. We achieved up to 34x better throughput than the serial CPU implementation of LZSS algorithm and up to 2.21x better than the parallelized version.