Instruction-Level Distributed Processing

Authors:
James E. Smith
Affiliations:
-
Venue:
Computer
Year:
2001

Citing 10
Cited 12

Evaluation of multithreaded uniprocessors for commercial application environments

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
A high-frequency custom CMOS S/390 microprocessor

IBM Journal of Research and Development - Special issue: IBM S/390 G3 and G4
Effective jump-pointer prefetching for linked data structures

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A scalable front-end architecture for fast instruction delivery

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
The future of interconnection technology

IBM Journal of Research and Development

An instruction set and microarchitecture for instruction level distributed processing

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Architectural differences of efficient sequential and parallel computers

Journal of Systems Architecture: the EUROMICRO Journal
Dynamic binary translation for accumulator-oriented architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Reducing the Latency and Area Cost of Core Swapping through Shared Helper Engines

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Dynamically configurable shared CMP helper engines for improved performance

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
An Efficient Way of Passing of Data in a Multithreaded Scheduled Dataflow Architecture

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Improving the performance and power efficiency of shared helpers in CMPs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Architectural contesting: exposing and exploiting temperamental behavior

ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Low-overhead core swapping for thermal management

PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Compiler-assisted energy optimization for clustered VLIW processors

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	4.10

Visualization

Abstract

For nearly 20 years, microarchitecture research has emphasized instruction-level parallelism (ILP), which improves performance by increasing the number of instructions per cycle. In striving for such parallelism, researchers have exploited advances in chip technology to develop complex, hardware-intensive processors. This trend has resulted in high intellectual complexity in the increasingly intricate schemes for squeezing performance out of second- and third-order effects. To simplify these increasingly complex designs, developers can borrow distributed- systems methods and apply them at the processor level to solve load balance, resource allocation, and communication problems. The current focus on ILP will likely shift to instruction-level distributed processing, emphasizing inter-instruction communication with dynamic optimization and a tight interaction between hardware and low-level software. To help find runtime parallelism, orchestrate distributed hardware resources, and implement power conservation strategies, an additional layer of abstraction-- the virtual machine layer--will likely become an essential ingredient. Finally, new instruction sets may be necessary to better focus on instruction-level communication and dependence, rather than computation and independence as is commonly done today.