Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Route packets, not wires: on-chip inteconnection networks
Proceedings of the 38th annual Design Automation Conference
Frequent value locality and value-centric data cache design
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Bus energy minimization by transition pattern coding (TPC) in deep sub-micron technologies
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Creating a wider bus using caching techniques
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Scalar Operand Networks: On-Chip Interconnect for ILP in Partitioned Architectures
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
A Delay Model and Speculative Architecture for Pipelined Routers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A dictionary-based en/decoding scheme for low-power data buses
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Adaptive Cache Compression for High-Performance Processors
Proceedings of the 31st annual international symposium on Computer architecture
Exploring High Bandwidth Pipelined Cache Architecture for Scaled Technology
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Unified Compressed Memory Hierarchy
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploiting Prediction to Reduce Power on Buses
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Rotary router: an efficient architecture for CMP interconnection networks
Proceedings of the 34th annual international symposium on Computer architecture
Express virtual channels: towards the ideal interconnection fabric
Proceedings of the 34th annual international symposium on Computer architecture
Flattened Butterfly Topology for On-Chip Networks
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Memory-Link Compression Schemes: A Value Locality Perspective
IEEE Transactions on Computers
Journal of Systems Architecture: the EUROMICRO Journal
Reducing Network-on-Chip energy consumption through spatial locality speculation
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Floodgate: application-driven flow control in network-on-chip for many-core architectures
Proceedings of the 4th International Workshop on Network on Chip Architectures
Analysis and runtime management of 3D systems with stacked DRAM for boosting energy efficiency
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
With the recent design shift towards increasing the number of processing elements in a chip, high-bandwidth support in on-chip interconnect is essential for low-latency communication. Much of the previous work has focused on router architectures and network topologies using wide/long channels. However, such solutions may result in a complicated router design and a high interconnect cost. In this paper, we exploit a table-based data compression technique, relying on value patterns in cache traffic. Compressing a large packet into a small one can increase the effective bandwidth of routers and links, while saving power due to reduced operations. The main challenges are providing a scalable implementation of tables and minimizing overhead of the compression latency. First, we propose a shared table scheme that needs one encoding and one decoding tables for each processing element, and a management protocol that does not require in-order delivery. Next, we present streamlined encoding that combines flit injection and encoding in a pipeline. Furthermore, data compression can be selectively applied to communication on congested paths only if compression improves performance. Simulation results in a 16-core CMP show that our compression method improves the packet latency by up to 44% with an average of 36% and reduces the network power consumption by 36% on average.