Evaluation of multithreaded uniprocessors for commercial application environments
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
A high-frequency custom CMOS S/390 microprocessor
IBM Journal of Research and Development - Special issue: IBM S/390 G3 and G4
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A scalable front-end architecture for fast instruction delivery
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
The future of interconnection technology
IBM Journal of Research and Development
An instruction set and microarchitecture for instruction level distributed processing
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Architectural differences of efficient sequential and parallel computers
Journal of Systems Architecture: the EUROMICRO Journal
Dynamic binary translation for accumulator-oriented architectures
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Reducing the Latency and Area Cost of Core Swapping through Shared Helper Engines
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Dynamically configurable shared CMP helper engines for improved performance
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
An Efficient Way of Passing of Data in a Multithreaded Scheduled Dataflow Architecture
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Improving the performance and power efficiency of shared helpers in CMPs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Architectural contesting: exposing and exploiting temperamental behavior
ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
Low-overhead core swapping for thermal management
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
Hi-index | 4.10 |
For nearly 20 years, microarchitecture research has emphasized instruction-level parallelism (ILP), which improves performance by increasing the number of instructions per cycle. In striving for such parallelism, researchers have exploited advances in chip technology to develop complex, hardware-intensive processors. This trend has resulted in high intellectual complexity in the increasingly intricate schemes for squeezing performance out of second- and third-order effects. To simplify these increasingly complex designs, developers can borrow distributed- systems methods and apply them at the processor level to solve load balance, resource allocation, and communication problems. The current focus on ILP will likely shift to instruction-level distributed processing, emphasizing inter-instruction communication with dynamic optimization and a tight interaction between hardware and low-level software. To help find runtime parallelism, orchestrate distributed hardware resources, and implement power conservation strategies, an additional layer of abstraction-- the virtual machine layer--will likely become an essential ingredient. Finally, new instruction sets may be necessary to better focus on instruction-level communication and dependence, rather than computation and independence as is commonly done today.