Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Anton, a special-purpose machine for molecular dynamics simulation
Proceedings of the 34th annual international symposium on Computer architecture
BioBench: A Benchmark Suite of Bioinformatics Applications
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
An Energy-Efficient Processor Architecture for Embedded Systems
IEEE Computer Architecture Letters
Amdahl's Law in the Multicore Era
Computer
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing
FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Communications of the ACM
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Dynamically Specialized Datapaths for energy efficient computing
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Toward Dark Silicon in Servers
IEEE Micro
How sensitive is processor customization to the workload's input datasets?
SASP '11 Proceedings of the 2011 IEEE 9th Symposium on Application Specific Processors
Benchmarking modern multiprocessors
Benchmarking modern multiprocessors
Bundled execution of recurring traces for energy-efficient general purpose processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Near-threshold voltage (NTV) design: opportunities and challenges
Proceedings of the 49th Annual Design Automation Conference
Codesign Tradeoffs for High-Performance, Low-Power Linear Algebra Architectures
IEEE Transactions on Computers
Hi-index | 0.00 |
Chip power consumption has reached its limits, leading to the flattening of single-core performance. We propose the 10x10 processor, a federated heterogeneous multi-core architecture, where each core is an ensemble of u-engines (micro-engines, similar to accelerators) specialized for different workload groups to achieve dramatically higher energy efficiency. The u-engines collectively target the entire general-purpose workload space. The problem we study in this article is selecting the set of workloads that each u-engine should be customized for. For this problem we study the computation structure of a wide variety of workloads and cluster together workloads with similar computation structures, the idea being that each u-engine will be customized for the compute structures exhibited by a particular cluster. The constraint on this problem is the silicon budget of a processor. Lower silicon budgets accommodate fewer uengines and require individual u-engines to target larger segments of the workload space which leads to lower energy efficiency benefits from customization, because there is more variation among the compute structures making up each cluster. Therefore, we also study how workload coverage and benefit can be maximized for a given silicon budget. We study a broad general-purpose workload that includes 34 codes from 6 benchmark suites, identifying the most frequent functions, and clustering them based on two sets of instruction usage features (high-resolution and low-resolution) into 8, 16, 32, 64, 128 clusters respectively. We develop abstract metrics (coverage and weighted customization benefit) to evaluate the clusters. We show significant potential payoffs with four benefit models: 2-3x (square root model), 4-10x (linear model), 12-24x (quadratic model), and 22-26x (cubic model).