A VLIW architecture for a trace scheduling compiler
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
An architecture framework for application-specific and scalable architectures
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
The JPEG still picture compression standard
Communications of the ACM - Special issue on digital multimedia systems
High level synthesis of pipelined instruction set processors and back-end compilers
DAC '92 Proceedings of the 29th ACM/IEEE Design Automation Conference
Viewing instruction set design as an optimization problem
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Cathedral-III: Architecture-driven high-level synthesis for high throughput DSP applications
DAC '91 Proceedings of the 28th ACM/IEEE Design Automation Conference
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
An evaluation system for application specific architectures
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A Hardware-Software Codesign Methodology for DSP Applications
IEEE Design & Test
Computer-Aided Hardware-Software Codesign
IEEE Micro
A technique to determine power-efficient, high-performance superscalar processors
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
CODES '94 Proceedings of the 3rd international workshop on Hardware/software co-design
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Media architecture: general purpose vs. multiple application-specific programmable processor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Data-path synthesis of VLIW video signal processors
Proceedings of the 11th international symposium on System synthesis
Customized instruction-sets for embedded processors
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
A technique for QoS-based system partitioning
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Exploring Hypermedia Processor Design Space
Journal of VLSI Signal Processing Systems - Special issue on multimedia signal processing
Proceedings of the 38th annual Design Automation Conference
Automated design of finite state machine predictors for customized processors
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Loop Transformations for Architectures with Partitioned Register Banks
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
IEEE Transactions on Computers
Tuning of loop cache architectures to programs in embedded system design
Proceedings of the 15th international symposium on System Synthesis
Design of an Adaptive Architecture for Energy Efficient Wireless Image Communication
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Design of an adaptive architecture for energy efficient wireless image communication
Embedded processor design challenges
Synthesis of customized loop caches for core-based embedded systems
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Reduced code size modulo scheduling in the absence of hardware support
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A Hardware/Software Reconfigurable Architecture for Adaptive Wireless Image Communication
ASP-DAC '02 Proceedings of the 2002 Asia and South Pacific Design Automation Conference
Automatic Architectural Synthesis of VLIW and EPIC Processors
Proceedings of the 12th international symposium on System synthesis
Universal Mechanisms for Data-Parallel Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Instruction buffering exploration for low energy VLIWs with instruction clusters
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Balancing design options with Sherpa
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
A reprogrammable customization framework for efficient branch resolution in embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Micro embedded monitoring for security in application specific instruction-set processors
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Tile size selection for low-power tile-based architectures
Proceedings of the 3rd conference on Computing frontiers
Hardware assisted pre-emptive control flow checking for embedded processors to improve reliability
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Efficient architectures through application clustering and architectural heterogeneity
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Transactions on High-Performance Embedded Architectures and Compilers I
Energy-aware register file re-partitioning for clustered VLIW architectures
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
ACM Transactions on Architecture and Code Optimization (TACO)
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Co-synthesis of FPGA-based application-specific floating point simd accelerators
Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays
Hi-index | 0.00 |
In this paper we report on a system which automatically designs realistic VLIW architectures highly optimized for one given application (the input for this system), while running all other code correctly. The system uses a product-quality compiler that generates very aggressive VLIW code. We retarget the compiler until we have found a VLIW architecture idealized for the application on the basis of performance, a cost function and a hardware budget. We show that we can automatically select architectures that achieve large speedups on color and image processing codes. Specialization is shown to be very valuable: The differences between architectural choices, even among reasonable-seeming architectures having similar costs, can be very great, often a factor of 5 (and sometimes much more). We show also that specialization is also very dangerous. A reasonable choice of architecture to fit one algorithm can be a very poor choice for another, even in the same domain. There is sometimes an architecture, near in cost and performance to the best, that does much better on a second algorithm.