The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Proceedings of the 2002 international symposium on Low power electronics and design
Exploring the Design Space of Future CMPs
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
RISC I: A Reduced Instruction Set VLSI Computer
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The optimum pipeline depth considering both power and performance
ACM Transactions on Architecture and Code Optimization (TACO)
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploring the cache design space for large scale CMPs
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Design space exploration for multicore architectures: a power/performance/thermal view
Proceedings of the 20th annual international conference on Supercomputing
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Illustrative Design Space Studies with Microarchitectural Regression Models
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Hardware implementation of micropolygon rasterization with motion and defocus blur
Proceedings of the Conference on High Performance Graphics
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
In this paper, we examine the area-performance design space of a processing core for a chip multiprocessor (CMP), considering both the architectural design space and the tradeoffs of the physical design on which the architecture relies. We first propose a methodology for performing an integrated optimization of both the micro-architecture and the physical circuit design of a microprocessor. In our approach, we use statistical and convex fitting methods to capture a large micro-architectural design space. We then characterize the area-delay tradeoffs of the underlying circuits through RTL synthesis. Finally, we establish the relationship between the architecture and the circuits in an integrative model, which we use to optimize the processor. As a case study, we apply this methodology to explore the performance-area tradeoffs in a highly parallel accelerator architecture for visual computing applications. Based on some early circuit tradeoff data, our results indicate that two separate designs are performance/area optimal for our set of benchmarks: a simpler single-issue, 2-way multithreaded core running at high-frequency, and a more aggressively tuned dual-issue 4-way multithreaded design running at a lower frequency.