Using data compression for increasing memory system utilization

Authors:
Ozcan Ozturk;Mahmut Kandemir;Mary Jane Irwin
Affiliations:
Department of Computer Engineering, Bilkent University, Ankara, Turkey;Microsystems Design Laboratory, Computer Science and Engineering Department, The Pennsylvania State University, University Park, PA;Computer Science and Engineering Department, The Pennsylvania State University, University Park, PA
Venue:
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year:
2009

Citing 42
Cited 1

Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Executing compressed programs on an embedded RISC architecture

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Evaluation of design alternatives for a multiprocessor microprocessor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Compiler-controlled memory

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The X-MatchLITE FPGA-based data compressor

FPGA '99 Proceedings of the 1999 ACM/SIGDA seventh international symposium on Field programmable gate arrays
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Frequent value compression in data caches

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Effective algorithms for cache-level compression

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Dynamic management of scratch-pad memory space

Proceedings of the 38th annual Design Automation Conference
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
An optimal memory allocation for application-specific multiprocessor system-on-chip

Proceedings of the 14th international symposium on Systems synthesis
Cache-Memory Interfaces in Compressed Memory Systems

IEEE Transactions on Computers
Hardware Compressed Main Memory: Operating System Support and Performance Evaluation

IEEE Transactions on Computers
Profile-guided code compression

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Automatic generation of embedded memory wrapper for multiprocessor SoC

Proceedings of the 39th annual Design Automation Conference
Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design

Custom Memory Management Methodology: Exploration of Memory Organisation for Embedded Multimedia System Design
Code compression for VLIW processors using variable-to-fixed coding

Proceedings of the 15th international symposium on System Synthesis
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Simics: A Full System Simulation Platform

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Increasing Energy Efficiency of Embedded Systems by Application-Specific Memory Hierarchy Generation

IEEE Design & Test
Design and Evaluation of a Selective Compressed Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Hardware-Assisted Data Compression for Energy Minimization in Systems with Embedded Processors

Proceedings of the conference on Design, automation and test in Europe
LZW-Based Code Compression for VLIW Embedded Systems

Proceedings of the conference on Design, automation and test in Europe - Volume 3
Code Compression Based on Operand-Factorization for VLIW Processors

DCC '04 Proceedings of the Conference on Data Compression
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
A fast hardware data compression algorithm and some algorithmic extensions

IBM Journal of Research and Development
A New Algorithm for Energy-Driven Data Compression in VLIW Embedded Processors

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Using data compression in an MPSoC architecture for improving performance

GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Compiler-Guided data compression for reducing memory consumption of embedded applications

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Data Compression with Restricted Parsings

DCC '06 Proceedings of the Data Compression Conference
Interactions Between Compression and Prefetching in Chip Multiprocessors

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
ILP-Based energy minimization techniques for banked memories

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Access pattern-based code compression for memory-constrained systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)

Two iterative metaheuristic approaches to dynamic memory allocation for embedded systems

EvoCOP'11 Proceedings of the 11th European conference on Evolutionary computation in combinatorial optimization

Quantified Score

Hi-index	0.03

Visualization

Abstract

The memory system presents one of the critical challenges in embedded system design and optimization. This is mainly due to the ever-increasing code complexity of embedded applications and the exponential increase seen in the amount of data they manipulate. The memory bottleneck is even more important for multiprocessor-system-on-a-chip (MPSoC) architectures due to the high cost of off-chip memory accesses in terms of both energy and performance. As a result, reducing the memory-space occupancy of embedded applications is very important and will be even more important in the next decade.While it is true that the on-chip memory capacity of embedded systems is continuously increasing, the increases in the complexity of embedded applications and the sizes of the data sets they process are far greater. Motivated by this observation, this paper presents and evaluates a compilerdriven approach to data compression for reducing memoryspace occupancy. Our goal is to study how automated compiler support can help in deciding the set of data elements to compress/ decompress and the points during execution at which these compressions/decompressions should be performed. We first study this problem in the context of single-core systems and then extend it to MPSoCs where we schedule compressions and decompressions intelligently such that they do not conflict with application execution as much as possible. Particularly, in MPSoCs, one needs to decide which processors should participate in the compression and decompression activities at any given point during the course of execution. We propose both static and dynamic algorithms for this purpose. In the static scheme, the processors are divided into two groups: those performing compression/decompression and those executing the application, and this grouping is maintained throughout the execution of the application. In the dynamic scheme, on the other hand, the execution starts with some grouping but this grouping can change during the course of execution, depending on the dynamic variations in the data access pattern. Our experimental results show that, in a single-core system, the proposed approach reduces maximum memory occupancy by 47.9% and average memory occupancy by 48.3% when averaged over all the benchmarks. Our results also indicate that, in an MPSoC, the average energy saving is 12.7% when all eight benchmarks are considered. While compressions and decompressions and related bookkeeping activities take extra cycles and memory space and consume additional energy, we found that the improvements they bring from the memory space, execution cycles, and energy perspectives are much higher than these overheads.