Practical Pram Programming
Temperature-aware microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Runtime Power Monitoring in High-End Processors: Methodology and Empirical Data
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Techniques for Multicore Thermal Management: Classification and New Exploration
Proceedings of the 33rd annual international symposium on Computer Architecture
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
Fpga-based prototype of a pram-on-chip processor
Proceedings of the 5th conference on Computing frontiers
Many-core design from a thermal perspective
Proceedings of the 45th annual Design Automation Conference
Computer Architecture Techniques for Power-Efficiency
Computer Architecture Techniques for Power-Efficiency
Central vs. distributed dynamic thermal management for multi-core processors: which one is better?
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Implementing sparse matrix-vector multiplication on throughput-oriented processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Distributed task migration for thermal management in many-core systems
Proceedings of the 47th Design Automation Conference
IEEE Spectrum
Scalable power control for many-core architectures running multi-threaded applications
Proceedings of the 38th annual international symposium on Computer architecture
Toolchain for Programming, Simulating and Studying the XMT Many-Core Architecture
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Hi-index | 0.00 |
In this paper, we present the work in progress that studies the run-time impact of various DTM techniques on a proposed 1024-core XMT chip. XMT aims to improve single task performance using fine-grained parallelism. Via simulations, we show that relative to a general global scheme, speedups of up to 46% with a dedicated interconnection controller and 22% with distributed control of computing clusters are possible. Our findings lead to several high level insights that can impact the design of a broader family of shared memory many-core systems.