Learning internal representations by error propagation
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Parallel digital implementations of neural networks
Parallel digital implementations of neural networks
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Energy-efficient signal processing via algorithmic noise-tolerance
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fuzzy Memoization for Floating-Point Multimedia Applications
IEEE Transactions on Computers
Ultra-efficient (embedded) SOC architectures based on probabilistic CMOS (PCMOS) technology
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
CHiMPS: a high-level compilation flow for hybrid CPU-FPGA architectures
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Multifold Acceleration of Neural Network Computations Using GPU
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Green: a framework for supporting energy-conscious programming using controlled approximation
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
Relax: an architectural framework for software recovery of hardware faults
Proceedings of the 37th annual international symposium on Computer architecture
Scalable stochastic processors
Proceedings of the Conference on Design, Automation and Test in Europe
ERSA: error resilient system architecture for probabilistic applications
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Flikker: saving DRAM refresh-power through critical data partitioning
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Energy-Efficient Floating-Point Unit Design
IEEE Transactions on Computers
EnerJ: approximate data types for safe and general low-power computation
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic abstraction and fault tolerance in cortical microachitectures
Proceedings of the 38th annual international symposium on Computer architecture
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Dynamically Specialized Datapaths for energy efficient computing
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
MARSS: a full system simulator for multicore x86 CPUs
Proceedings of the 48th Design Automation Conference
Managing performance vs. accuracy trade-offs with loop perforation
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
A Fault Criticality Evaluation Framework of Digital Systems for Error Tolerant Video Applications
ATS '11 Proceedings of the 2011 Asian Test Symposium
Architecture support for disciplined approximate programming
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Estimations of error bounds for neural-network function approximators
IEEE Transactions on Neural Networks
A defect-tolerant accelerator for emerging high-performance applications
Proceedings of the 39th Annual International Symposium on Computer Architecture
Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures
IEEE Transactions on Computers
BenchNN: On the broad potential application scope of hardware neural network accelerators
IISWC '12 Proceedings of the 2012 IEEE International Symposium on Workload Characterization (IISWC)
A general constraint-centric scheduling framework for spatial architectures
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Continuous real-world inputs can open up alternative accelerator designs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Verifying quantitative reliability for programs that execute on unreliable hardware
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Quality programmable vector processors for approximate computing
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
SAGE: self-tuning approximation for graphics engines
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Approximate storage in solid-state memories
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Paraprox: pattern-based approximation for data parallel applications
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Uncertain: a first-order type for uncertain data
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Post-compiler software optimization for reducing energy
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
This paper describes a learning-based approach to the acceleration of approximate programs. We describe the \emph{Parrot transformation}, a program transformation that selects and trains a neural network to mimic a region of imperative code. After the learning phase, the compiler replaces the original code with an invocation of a low-power accelerator called a \emph{neural processing unit} (NPU). The NPU is tightly coupled to the processor pipeline to accelerate small code regions. Since neural networks produce inherently approximate results, we define a programming model that allows programmers to identify approximable code regions -- code that can produce imprecise but acceptable results. Offloading approximable code regions to NPUs is faster and more energy efficient than executing the original code. For a set of diverse applications, NPU acceleration provides whole-application speedup of 2.3x and energy savings of 3.0x on average with quality loss of at most 9.6%.