Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
A case for a complexity-effective, width-partitioned microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO)
Interconnect design considerations for large NUCA caches
Proceedings of the 34th annual international symposium on Computer architecture
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
The Journal of Supercomputing
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
HiPC'07 Proceedings of the 14th international conference on High performance computing
Journal of Systems Architecture: the EUROMICRO Journal
A hybrid packet-circuit switched on-chip network based on SDM
Proceedings of the Conference on Design, Automation and Test in Europe
Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
Exploring topologies for source-synchronous ring-based network-on-chip
Proceedings of the Conference on Design, Automation and Test in Europe
A heterogeneous multiple network-on-chip design: an application-aware approach
Proceedings of the 50th Annual Design Automation Conference
A fast, source-synchronous ring-based network-on-chip design
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
DESC: energy-efficient data exchange using synchronized counters
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.05 |
Future high-performance billion-transistor processors are likely to employ partitioned architectures to achieve high clock speeds, high parallelism, low design complexity, and low power. In such architectures, inter-partition communication over global wires has a significant impact on overall processor performance and power consumption. VLSI techniques allow a variety of wire implementations, but these wire properties have previously never been exposed to the microarchitecture. This paper advocates global wire management at the microarchitecture level and proposes a heterogeneous interconnect that is comprised of wires with varying latency, bandwidth, and energy characteristics. We propose and evaluate microarchitectural techniques that can exploit such a heterogeneous interconnect to improve performance and reduce energy consumption. These techniques include a novel cache pipeline design, the identification of narrow bit-width operands, the classification of non-critical data, and the detection of interconnect load imbalance. For a dynamically scheduled partitioned architecture, our results demonstrate that the proposed innovations result in up to 11% reductions in overall processor ED^2, compared to a baseline processor that employs a homogeneous interconnect.