Adaptive data compression for high-performance low-power on-chip networks

Authors:
Yuho Jin;Ki Hwan Yum;Eun Jung Kim
Affiliations:
Department of Computer Science, Texas A&MUniversity, College Station, 77843 USA;Department of Computer Science, University of Texas at San Antonio, 78249 USA;Department of Computer Science, Texas A&MUniversity, College Station, 77843 USA
Venue:
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Year:
2008

Citing 27
Cited 4

Value locality and load value prediction

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Route packets, not wires: on-chip inteconnection networks

Proceedings of the 38th annual Design Automation Conference
Frequent value locality and value-centric data cache design

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Bus energy minimization by transition pattern coding (TPC) in deep sub-micron technologies

Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Simics: A Full System Simulation Platform

Computer
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Creating a wider bus using caching techniques

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
A Delay Model and Speculative Architecture for Pipelined Routers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A dictionary-based en/decoding scheme for low-power data buses

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology

DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Unified Compressed Memory Hierarchy

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
Exploiting Prediction to Reduce Power on Buses

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Rotary router: an efficient architecture for CMP interconnection networks

Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric

Proceedings of the 34th annual international symposium on Computer architecture
On-Chip Interconnection Architecture of the Tile Processor

IEEE Micro
A 5-GHz Mesh Interconnect for a Teraflops Processor

IEEE Micro
Flattened Butterfly Topology for On-Chip Networks

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Memory-Link Compression Schemes: A Value Locality Perspective

IEEE Transactions on Computers

Exploiting address compression and heterogeneous interconnects for efficient message management in tiled CMPs

Journal of Systems Architecture: the EUROMICRO Journal
Reducing Network-on-Chip energy consumption through spatial locality speculation

NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Floodgate: application-driven flow control in network-on-chip for many-core architectures

Proceedings of the 4th International Workshop on Network on Chip Architectures
Analysis and runtime management of 3D systems with stacked DRAM for boosting energy efficiency

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the recent design shift towards increasing the number of processing elements in a chip, high-bandwidth support in on-chip interconnect is essential for low-latency communication. Much of the previous work has focused on router architectures and network topologies using wide/long channels. However, such solutions may result in a complicated router design and a high interconnect cost. In this paper, we exploit a table-based data compression technique, relying on value patterns in cache traffic. Compressing a large packet into a small one can increase the effective bandwidth of routers and links, while saving power due to reduced operations. The main challenges are providing a scalable implementation of tables and minimizing overhead of the compression latency. First, we propose a shared table scheme that needs one encoding and one decoding tables for each processing element, and a management protocol that does not require in-order delivery. Next, we present streamlined encoding that combines flit injection and encoding in a pipeline. Furthermore, data compression can be selectively applied to communication on congested paths only if compression improves performance. Simulation results in a 16-core CMP show that our compression method improves the packet latency by up to 44% with an average of 36% and reduces the network power consumption by 36% on average.