Proceedings of the conference on Design, automation and test in Europe
Sora: high performance software radio using general purpose multi-core processors
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Solving Sparse Linear Systems on NVIDIA Tesla GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Yield-oriented evaluation methodology of network-on-chip routing implementations
SOC'09 Proceedings of the 11th international conference on System-on-chip
Run-time task allocation considering user behavior in embedded multiprocessor networks-on-chip
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
MEDEA: a hybrid shared-memory/message-passing multiprocessor NoC-based architecture
Proceedings of the Conference on Design, Automation and Test in Europe
RMOT: recursion in model order for task execution time estimation in a software pipeline
Proceedings of the Conference on Design, Automation and Test in Europe
Efficient constant-time entropy decoding for H.264
Proceedings of the Conference on Design, Automation and Test in Europe
An adaptive cache coherence protocol for chip multiprocessors
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
A NoC-based hybrid message-passing/shared-memory approach to CMP design
Microprocessors & Microsystems
Database engines on multicores, why parallelize when you can distribute?
Proceedings of the sixth conference on Computer systems
Comparison of lock thrashing avoidance methods and its performance implications for lock design
Proceedings of the third international workshop on Large-scale system and application performance
Branch penalty reduction on IBM cell SPUs via software branch hinting
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures
ACM Transactions on Embedded Computing Systems (TECS)
New basic linear algebra methods for simulation on GPUs
Proceedings of the 2011 Grand Challenges on Modeling and Simulation Conference
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
GPU-accelerated preconditioned iterative linear solvers
The Journal of Supercomputing
Yield-enhancement schemes for multicore processor and memory stacked 3D ICs
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
Multicore has shown significant performance and power advantages over single cores in commercial systems with a 2--4 cores. Applying a corollary of Moore's Law for multicore, we expect to see 1K multicore chips within a decade. 1K multicore systems introduce significant architectural challenges. One of these is the power efficiency challenge. Today's cores consume 10's of watts. Even at about one watt per core, a 1K-core chip would need to dissipate 1K watts! This paper discusses the "Kill rule for multicore" for power-efficient multicore design, an approach inspired by the "Kiss rule for RISC processor design". Kill stands for Kill if less than linear, and represents a design approach in which any additional area allocated to a resource within a core, such as a cache, is carefully traded off against using the area for additional cores. The Kill Rule states that we must increase resource size (for example, cache size) only if for every 1% increase in core area there is at least a 1% increase in core performance.