A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Executing compressed programs on an embedded RISC architecture
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
Applied cryptography (2nd ed.): protocols, algorithms, and source code in C
Applied cryptography (2nd ed.): protocols, algorithms, and source code in C
Custom-fit processors: letting applications define architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting data forwarding to reduce the power budget of VLIW embedded processors
Proceedings of the conference on Design, automation and test in Europe
High-quality operation binding for clustered VLIW datapaths
Proceedings of the 38th annual Design Automation Conference
Energy estimation and optimization of embedded VLIW processors based on instruction clustering
Proceedings of the 39th annual Design Automation Conference
An interleaved cache clustered VLIW processor
ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cluster assignment for high-performance embedded VLIW processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A run-time word-level reconfigurable coarse-grain functional unit for a VLIW processor
Proceedings of the 15th international symposium on System Synthesis
Instruction Scheduler Generation for Retargetable Compilation
IEEE Design & Test
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Register File Architecture and Compilation Scheme for Clustered ILP Processors
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A New Facility for Dynamic Control of Program Execution: DELI
EMSOFT '02 Proceedings of the Second International Conference on Embedded Software
Branch prediction techniques for low-power VLIW processors
Proceedings of the 13th ACM Great Lakes symposium on VLSI
Compiler-directed customization of ASIP cores
Proceedings of the tenth international symposium on Hardware/software codesign
Reduced code size modulo scheduling in the absence of hardware support
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
DELI: a new run-time control point
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Inter-Cluster Communication Models for Clustered VLIW Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Overcoming the limitations of conventional vector processors
Proceedings of the 30th annual international symposium on Computer architecture
Variable Instruction Set Architecture and Its Compiler Support
IEEE Transactions on Computers
Cluster assignment of global values for clustered VLIW processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
A scalable wide-issue clustered VLIW with a reconfigurable interconnect
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 2003 ACM symposium on Applied computing
Instruction Replication for Clustered Microarchitectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors
Proceedings of the 1st conference on Computing frontiers
Power-aware branch prediction techniques: a compiler-hints based approach for VLIW processors
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Characterizing embedded applications for instruction-set extensible processors
Proceedings of the 41st annual Design Automation Conference
Speculative software management of datapath-width for energy optimization
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Design space exploration of caches using compressed traces
Proceedings of the 18th annual international conference on Supercomputing
Automatic application-specific instruction-set extensions under microarchitectural constraints
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
Removing communications in clustered microarchitectures through instruction replication
ACM Transactions on Architecture and Code Optimization (TACO)
A New Algorithm for Energy-Driven Data Compression in VLIW Embedded Processors
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Balancing design options with Sherpa
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Scalable custom instructions identification for instruction-set extensible processors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Procedure placement using temporal-ordering information: dealing with code size expansion
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
A scalable, clustered SMT processor for digital signal processing
MEDEA '03 Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture
Evaluation of Bus Based Interconnect Mechanisms in Clustered VLIW Architectures
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
A unified processor architecture for RISC & VLIW DSP
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Using data compression in an MPSoC architecture for improving performance
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
An Energy Efficient Instruction Set Synthesis Framework for Low Power Embedded System Designs
IEEE Transactions on Computers
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach
Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Demystifying on-the-fly spill code
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A Simulation and Exploration Technology for Multimedia-Application-Driven Architectures
Journal of VLSI Signal Processing Systems
Distributed Data Cache Designs for Clustered VLIW Processors
IEEE Transactions on Computers
Satisfying real-time constraints with custom instructions
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploiting pipelining to relax register-file port constraints of instruction-set extensions
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Variable-Based Multi-module Data Caches for Clustered VLIW Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
High-Performance and Low-Cost Dual-Thread VLIW Processor Using Weld Architecture Paradigm
IEEE Transactions on Parallel and Distributed Systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Software and hardware techniques to optimize register file utilization in VLIW architectures
International Journal of Parallel Programming
Register aware scheduling for distributed cache clustered architecture
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
A new register file access architecture for software pipelining in VLIW processors
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Quantifying software requirements for supporting archived office documents using emulation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
High-quality ISA synthesis for low-power cache designs in embedded microprocessors
IBM Journal of Research and Development
Proceedings of the 43rd annual Design Automation Conference
Global instruction scheduling in dynamic compilation for embedded systems
JTRES '06 Proceedings of the 4th international workshop on Java technologies for real-time and embedded systems
Instruction scheduling for a tiled dataflow architecture
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Efficient architectures through application clustering and architectural heterogeneity
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reducing code size in VLIW instruction scheduling
Journal of Embedded Computing - Low-power Embedded Systems
Procedure placement using temporal-ordering information: Dealing with code size expansion
Journal of Embedded Computing - Cache exploitation in embedded systems
JIST: Just-In-Time scheduling translation for parallel processors
Scientific Programming
Inter-cluster communication in VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Virtual Cluster Scheduling Through the Scheduling Graph
Proceedings of the International Symposium on Code Generation and Optimization
Heterogeneous Clustered VLIW Microarchitectures
Proceedings of the International Symposium on Code Generation and Optimization
Stream execution on wide-issue clustered VLIW architectures
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Interactive presentation: Time-constrained clustering for DSE of clustered VLIW-ASP
Proceedings of the conference on Design, automation and test in Europe
Improvements to the Psi-SSA representation
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
An efficient framework for dynamic reconfiguration of instruction-set customization
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Integration, the VLSI Journal
High-performance and low-power VLIW cores for numerical computations
International Journal of High Performance Computing and Networking
Proceedings of the 2008 ACM symposium on Applied computing
Accelerated AES implementations via generalized instruction set extensions
Journal of Computer Security - The Third IEEE International Symposium on Security in Networks and Distributed Systems
Journal of Signal Processing Systems
Journal of Signal Processing Systems - Special Issue: Embedded computing systems for DSP
The Instruction-Set Extension Problem: A Survey
ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Inter-block Scoreboard Scheduling in a JIT Compiler for VLIW Processors
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Cross-layer customization for rapid and low-cost task preemption in multitasked embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Computation and data transfer co-scheduling for interconnection bus minimization
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
ACM Transactions on Architecture and Code Optimization (TACO)
Efficient Method to Generate an Energy Efficient Schedule Using Operation Shuffling
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Evaluation of bus based interconnect mechanisms in clustered VLIW architectures
International Journal of Parallel Programming
Tetris-XL: A performance-driven spill reduction technique for embedded VLIW processors
ACM Transactions on Architecture and Code Optimization (TACO)
Integration, the VLSI Journal
Runtime Adaptive Extensible Embedded Processors -- A Survey
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Hybrid multithreading for VLIW processors
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Using data compression for increasing memory system utilization
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach
Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
A Multi-Shared Register File Structure for VLIW Processors
Journal of Signal Processing Systems
Optimizing scheduling and intercluster connection for application-specific DSP processors
IEEE Transactions on Signal Processing
Customizing the datapath and ISA of soft VLIW processors
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
A study of energy saving in customizable processors
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
International Journal of High Performance Systems Architecture
Dynamically reconfigurable register file for a softcore VLIW processor
Proceedings of the Conference on Design, Automation and Test in Europe
Compiler-assisted power optimization for clustered VLIW architectures
Parallel Computing
Enhancing the performance of symmetric-key cryptography via instruction set extensions
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Instruction-Set Extension Problem: A Survey
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Graph-coloring and treescan register allocation using repairing
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Making wide-issue VLIW processors viable on FPGAs
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A fast instruction set evaluation method for ASIP designs
EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
DESCOMP: a new design space exploration approach
ARCS'05 Proceedings of the 18th international conference on Architecture of Computing Systems conference on Systems Aspects in Organic and Pervasive Computing
An offline approach for whole-program paths analysis using suffix arrays
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Branch strategies to optimize decision trees for wide-issue architectures
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Exact custom instruction enumeration for extensible processors
Integration, the VLSI Journal
A run-time task migration scheme for an adjustable issue-slots multi-core processor
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
An efficient multi-core SIMD implementation for H.264/AVC encoder
VLSI Design - Special issue on VLSI Circuits, Systems, and Architectures for Advanced Image and Video Compression Standards
Variability-tolerant workload allocation for MPSoC energy minimization under real-time constraints
ACM Transactions on Embedded Computing Systems (TECS)
ACM Transactions on Embedded Computing Systems (TECS)
The resource-constrained modulo scheduling problem: an experimental study
Computational Optimization and Applications
Configurable fault-tolerance for a configurable VLIW processor
ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Support for dynamic issue width in VLIW processors using generic binaries
Proceedings of the Conference on Design, Automation and Test in Europe
Shared-port register file architecture for low-energy VLIW processors
ACM Transactions on Architecture and Code Optimization (TACO)
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Extended Instruction Exploration for Multiple-Issue Architectures
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.02 |
Lx is a scalable and customizable VLIW processor technology platform designed by Hewlett-Packard and STMicroelectronics that allows variations in instruction issue width, the number and capabilities of structures and the processor instruction set. For Lx we developed the architecture and software from the beginning to support both scalability (variable numbers of identical processing resources) and customizability (special purpose resources).In this paper we consider the following issues. When is customization or scaling beneficial? How can one determine the right degree of customization or scaling for a particular application domain? What architectural compromises were made in the Lx project to contain the complexity inherent in a customizable and scalable processor family?The experiments described in the paper show that specialization for an application domain is effective, yielding large gains in price/performance ratio. We also show how scaling machine resources scales performance, although not uniformly across all applications. Finally we show that customization on an application-by-application basis is today still very dangerous and much remains to be done for it to become a viable solution.