Computer - IEEE Centennial: the state of computing
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
HPSm, a high performance restricted data flow architecture having minimal functionality
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Modeling the effects of instruction queue loading on a static instruction stream micro-architecture
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Incremental performance contributions of hardware concurrency extraction techniques
Proceedings of the 1st International Conference on Supercomputing
Desirable code transformations for a concurrent machine
Selected papers of the second workshop on Languages and compilers for parallel computing
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A Theory of Reduced and Minimal Procedural Dependencies
IEEE Transactions on Computers
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Exploiting fine-grained parallelism through a combination of hardware and software techniques
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Enhancing concurrent program execution with eager evaluation
Enhancing concurrent program execution with eager evaluation
Distributed Instruction Set Computer Architecture
IEEE Transactions on Computers
Data path issues in a highly concurrent machine (abstract)
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
On the combination of hardware and software concurrency extraction methods
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Dhrystone: a synthetic systems programming benchmark
Communications of the ACM
Algorithm 428: Hu-Tucker minimum redundancy alphabetic coding method [Z]
Communications of the ACM
Algorithm 410: Partial sorting
Communications of the ACM
Programming in Pascal with Pascal 1000
Programming in Pascal with Pascal 1000
IEEE Micro
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Representation and detection of concurrency using ordering-matrices.
Representation and detection of concurrency using ordering-matrices.
Hardware extraction of low-level concurrency from sequential instruction streams (parallelism, implementation, architecture, dependencies, semantics)
A Theory of Reduced and Minimal Procedural Dependencies
IEEE Transactions on Computers
Data path issues in a highly concurrent machine
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Extraction of massive instruction level parallelism
ACM SIGARCH Computer Architecture News
Disjoint eager execution: an optimal form of speculative execution
Proceedings of the 28th annual international symposium on Microarchitecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Proceedings of the 24th annual international symposium on Computer architecture
Speculative multithreaded processors
ICS '98 Proceedings of the 12th international conference on Supercomputing
Dynamic vectorization: a mechanism for exploiting far-flung ILP in ordinary programs
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Speculative dynamic vectorization
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Requirements for Optimal Execution of Loops with Tests
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 14.98 |
Hardware solutions to low-level (semantic) concurrency extraction are presented, focusing on the reduction of both control-flow and dataflow inhibitors of concurrency in general-purpose and scientific instruction streams. In the first model, CONDEL-1, an input code control flow model based on the code's branch domains is used in the algorithm to detect the reduced procedural dependencies in the input code. This model allows branches to execute concurrently. The cost and delay of the model's concurrency hardware are demonstrated to be relatively low, especially for the detection of concurrency beyond branches. The reduced procedural dependence techniques of CONDEL-1 are combined with high-speed reduced data dependency techniques to yield a machine model, CONDEL-2, executing standard sequential code in a manner beyond data-flow. Simulation results are presented and analyzed, showing the model's functionality and performance improvement. The beneficial effects of limited software optimizations are also reviewed.