Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Core architecture optimization for heterogeneous chip multiprocessors
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia
Proceedings of the 3rd ACM/IEEE Symposium on Architecture for networking and communications systems
Deflating the big bang: fast and scalable deep packet inspection with extended finite automata
Proceedings of the ACM SIGCOMM 2008 conference on Data communication
Introduction to the wire-speed processor and architecture
IBM Journal of Research and Development
Communications of the ACM
Ultra low latency market data feed on IBM PowerENTM
Computer Science - Research and Development
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Efficient data streaming with on-chip accelerators: Opportunities and challenges
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Disengaged scheduling for fair, protected access to fast computational accelerators
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Exploring the design space of programmable regular expression matching accelerators
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Computation at the edge of a datacenter has unique characteristics; it deals with streaming data from multiple sources, often requiring repeated application of several standard algorithmic kernels. The demand for high data rates and power efficiency points toward hardware acceleration of key functions. These accelerators must be tightly integrated with general purpose computation to keep invocation overhead and latency low. The accelerators must be easy for software to use, and the system must be flexible enough to support evolving networking standards. In this paper, we describe and evaluate the architecture of IBM's PowerEN processor, with a focus on its on-chip hardware accelerators. PowerEN unites the throughput of application-specific accelerators with the programmability of general purpose cores on a single coherent memory architecture. Hardware acceleration improves throughput by orders of magnitude in some cases compared to equivalent computation on the general purpose cores. By offloading work to the accelerators, general purpose cores are freed to simultaneously work on computation less suited to acceleration.