An efficient method of computing static single assignment form
POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Executing a Program on the MIT Tagged-Token Dataflow Architecture
IEEE Transactions on Computers
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Register allocation via graph coloring
Register allocation via graph coloring
Profile-assisted instruction scheduling
International Journal of Parallel Programming
Profile-driven instruction level parallel scheduling with application to super blocks
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Exploiting instruction level parallelism in the presence of conditional branches
Exploiting instruction level parallelism in the presence of conditional branches
Quality and speed in linear-scan register allocation
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Integrated predicated and speculative execution in the IMPACT EPIC architecture
Proceedings of the 25th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
A design space evaluation of grid processor architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Garp Architecture and C Compiler
Computer
The Alpha 21264 Microprocessor
IEEE Micro
A preliminary architecture for a basic data-flow processor
ISCA '75 Proceedings of the 2nd annual symposium on Computer architecture
Treegion Scheduling for Wide Issue Processors
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture
Proceedings of the 30th annual international symposium on Computer architecture
Systematic compilation for predicated execution
Systematic compilation for predicated execution
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Spatial computation
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Reducing control overhead in dataflow architectures
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A spatial path scheduling algorithm for EDGE architectures
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Merging Head and Tail Duplication for Convergent Hyperblock Formation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Distributed Microarchitectural Protocols in the TRIPS Prototype Processor
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Computer Systems (TOCS)
Data locality enhancement for CMPs
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
High performance dense linear algebra on a spatially distributed processor
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiling for vector-thread architectures
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Software-directed combined cpu/link voltage scaling fornoc-based cmps
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Register Bank Assignment for Spatially Partitioned Processors
Languages and Compilers for Parallel Computing
Convergent Compilation Applied to Loop Unrolling
Transactions on High-Performance Embedded Architectures and Compilers I
An evaluation of the TRIPS computer system
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Strategies for mapping dataflow blocks to distributed hardware
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Stream Compilation for Real-Time Embedded Multicore Systems
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Using a configurable processor generator for computer architecture prototyping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Federation: Boosting per-thread performance of throughput-oriented manycore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Journal of Parallel and Distributed Computing
Rapid, low-power loop execution in a network of functional units
Proceedings of the 17th Panhellenic Conference on Informatics
Hi-index | 0.00 |
Explicit Data Graph Execution (EDGE) architectures offer the possibility of high instruction-level parallelism with energy efficiency. In EDGE architectures, the compiler breaks a program into a sequence of structured blocks that the hardware executes atomically. The instructions within each block communicate directly, instead of communicating through shared registers. The TRIPS EDGE architecture imposes restrictions on its blocks to simplify the microarchitecture: each TRIPS block has at most 128 instructions, issues at most 32 loads and/or stores, and executes at most 32 register bank reads and 32 writes. To detect block completion, each TRIPS block must produce a constant number of outputs (stores and register writes) and a branch decision. The goal of the TRIPS compiler is to produce TRIPS blocks full of useful instructions while enforcing these constraints. This paper describes a set of compiler algorithms that meet these sometimes conflicting goals, including an algorithm that assigns load and store identifiers to maximize the number of loads and stores within a block. We demonstrate the correctness of these algorithms in simulation on SPEC2000, EEMBC, and microbenchmarks extracted from SPEC2000 and others. We measure speedup in cycles over an Alpha 21264 on microbenchmarks.