Real-Time Parallel MPEG-2 Decoding in Software
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Master/slave speculative parallelization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Parallel Multiple-Symbol Variable-Length Decoding
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Improving Value Communication for Thread-Level Speculation
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Spice: speculative parallel iteration chunk execution
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Guaranteed Synchronization of Huffman Codes
DCC '08 Proceedings of the Data Compression Conference
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Software transactional memory for multicore embedded systems
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Speculative parallelization using software multi-threaded transactions
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Speculative parallelization using state separation and multiple value prediction
Proceedings of the 2010 international symposium on Memory management
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Variable-length coding is widely used for efficient data compression. Typically, the compressor splits the original data into blocks and compresses each block with variable-length codes, hence producing variable-length compressed blocks. Although the compressor can easily exploit ample block-level parallelism, it is much more difficult to extract such coarse-grain parallelism from the decompressor because a block boundary cannot be located until decompression of the previous block is completed. This paper presents novel algorithms to efficiently predict block boundaries and a runtime system that enables efficient block-level parallel decompression, called SDM. The SDM execution model features speculative pipelining with three stages: Scanner, Decompressor, and Merger. The scanner stage employs a high-confidence prediction algorithm that finds compressed block boundaries without fully decompressing individual blocks. This information is communicated to the parallel decompressor stage in which multiple blocks are decompressed in parallel. The decompressed blocks are merged in order by the merger stage to produce the final output. The SDM runtime is specialized to execute this pipeline correctly and efficiently on resource-constrained embedded platforms. With SDM we effectively parallelize three production-grade variable-length decompression algorithms?zlib, bzip2, and H.264?with maximum speedups of 2.50× and 8.53× (and geometric mean speedups of 1.96× and 4.04×) on 4-core and 36-core embedded platforms, respectively.