A universal parallel computer architecture
Selected papers of international conference on Fifth generation computer systems 92
Proceedings of the 1999 ACM symposium on Applied computing
Maps: a compiler-managed memory system for raw machines
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
An Algorithm-Hardware-System Approach to VLIW Multimedia Processors
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
Multimedia Signal Processors: An Architectural Platform with Algorithmic Compilation
Journal of VLSI Signal Processing Systems - special issue on multimedia signal processing
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Proceedings of the 14th international conference on Supercomputing
Performance analysis and optimization of latency insensitive systems
Proceedings of the 37th Annual Design Automation Conference
Smart Memories: a modular reconfigurable architecture
Proceedings of the 27th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Multiple-banked register file architectures
Proceedings of the 27th annual international symposium on Computer architecture
A methodology for correct-by-construction latency insensitive design
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing wire delay penalty through value prediction
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Instruction distribution heuristics for quad-cluster, dynamically-scheduled, superscalar processors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the issue logic
ICS '01 Proceedings of the 15th international conference on Supercomputing
Systolic Opportunities for Multidimensional Data Streams
IEEE Transactions on Parallel and Distributed Systems
Implementing asynchronous circuits using a conventional EDA tool-flow
Proceedings of the 39th annual Design Automation Conference
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Reducing the complexity of the register file in dynamic superscalar processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors
International Journal of Parallel Programming
Dynamic Code Partitioning for Clustered Architectures
International Journal of Parallel Programming
Runtime Reconfiguration Techniques for Efficient General-Purpose Computation
IEEE Design & Test
IEEE Micro
Coping with Latency in SOC Design
IEEE Micro
Efficient Interconnects for Clustered Microarchitectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Heterogeneous Clustered Processors: Organisation and Design
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Asynchronous First-in First-out Queues
PATMOS '00 Proceedings of the 10th International Workshop on Integrated Circuit Design, Power and Timing Modeling, Optimization and Simulation
CAV '99 Proceedings of the 11th International Conference on Computer Aided Verification
Dynamic frequency and voltage control for a multiple clock domain microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamically managing the communication-parallelism trade-off in future clustered processors
Proceedings of the 30th annual international symposium on Computer architecture
Multiple-path execution for chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
On-chip communication design: roadblocks and avenues
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors
Proceedings of the 1st conference on Computing frontiers
Floorplanning optimization with trajectory piecewise-linear model for pipelined interconnects
Proceedings of the 41st annual Design Automation Conference
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
Design and CAD Challenges in sub-90nm CMOS Technologies
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
Application adaptive energy efficient clustered architectures
Proceedings of the 2004 international symposium on Low power electronics and design
2.5D system integration: a design driven system implementation schema
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Interconnect Planning with Local Area Constrained Retiming
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Scalable selective re-execution for EDGE architectures
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Formal online methods for voltage/frequency control in multiple clock domain microprocessors
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
On-Chip Interconnects and Instruction Steering Schemes for Clustered Microarchitectures
IEEE Transactions on Parallel and Distributed Systems
Effective Instruction Prefetching via Fetch Prestaging
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Balancing clustering-induced stalls to improve performance in clustered processors
Proceedings of the 2nd conference on Computing frontiers
Combined circuit and architectural level variable supply-voltage scaling for low power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A Distributed Control Path Architecture for VLIW Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A chip prototyping substrate: the flexible architecture for simulation and testing (FAST)
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
A power aware system level interconnect design methodology for latency-insensitive systems
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
SCMP: a single-chip message-passing parallel computer
The Journal of Supercomputing - Special issue: Parallel and distributed processing and applications
Network system design affects distributed parallel computing
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
The performance analysis of linux networking - Packet receiving
Computer Communications
Interconnect design considerations for large NUCA caches
Proceedings of the 34th annual international symposium on Computer architecture
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Microarchitecture configurations and floorplanning co-optimization
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis of static and dynamic energy consumption in NUCA caches: initial results
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
From synchronous to GALS: A new architecture for FPGAs
Microelectronics Journal
The SKB: a semi-completely-connected bus for on-chip systems
NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
The auction: optimizing banks usage in Non-Uniform Cache Architectures
Proceedings of the 24th ACM International Conference on Supercomputing
Enhancing L2 organization for CMPs with a center cell
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Throughput optimization for latency-insensitive system with minimal queue insertion
Proceedings of the 16th Asia and South Pacific Design Automation Conference
2.5-Dimensional VLSI system integration
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and analysis of adaptive processor
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
A high efficient on-chip interconnection network in SIMD CMPs
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
Periodic scheduling of marked graphs using balanced binary words
Theoretical Computer Science
Towards efficient dynamic LLC home bank mapping with noc-level support
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 4.11 |
The most important physical trend facing chip architects is the fact that on-chip wires are becoming much slower relative to logic as the on-chip devices shrink. The author points out that it will soon be impossible to maintain one global clock over the entire chip, and sending signals across a billion-transistor processor may require as many as 20 cycles.