The parallel execution of DO loops
Communications of the ACM
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)
Principles of Compiler Design (Addison-Wesley series in computer science and information processing)
Very long instruction work architectures and the ELI-512
25 years of the international symposia on Computer architecture (selected papers)
The impact of synchronization and granularity on parallel systems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Parallel processing: a smart compiler and a dumb machine
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Parallel processing: a smart compiler and a dumb machine
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Toward type-oriented dynamic vertical migration
ACM SIGMICRO Newsletter
Evaluation of bus based interconnect mechanisms in clustered VLIW architectures
International Journal of Parallel Programming
Hi-index | 0.00 |
Horizontally microprogrammable CPUs belong to a class of machines having statically schedulable parallel instruction execution (SPIE machines). Several experiments have shown that within basic blocks, real code only gives a potential speed-up factor of 2 or 3 when compacted for SPIE machines, even in the presence of unlimited hardware. In this paper, similar experiments are described. However, these measure the potential parallelism available using any global compaction method, that is, one which compacts code beyond block boundaries. Global compaction is a subject of current investigation; no measurements yet exist on implemented systems. The approach taken is to first assume that an oracle is available during compaction. This oracle can resolve all dynamic considerations in advance, giving us the ability to find the maximum parallelism available without reformulation of the algorithm. The parallelism found is constrained only by legitimate data dependencies, since questions of conditional jump directions and unresolved indirect memory references are answered by the oracle. Using such an oracle, we find that typical scientific programs may be sped up by anywhere from 3 to 1000 times. These dramatic results provide an upper bound for global compaction techniques. We describe experiments in progress which attempt to limit the oracle progressively, with the aim of eventually producing one which provides only information that may be obtained by a very good compiler. This will give us a more practical measure of the parallelism potentially obtainable via global compaction methods.