A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms
International Journal of Computer Vision
System-level exploration for pareto-optimal configurations in parameterized systems-on-a-chip
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Accurate and efficient regression modeling for microarchitectural performance and power prediction
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Efficiently exploring architectural design spaces via predictive modeling
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
A Predictive Performance Model for Superscalar Processors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient design space exploration for application specific systems-on-a-chip
Journal of Systems Architecture: the EUROMICRO Journal
Predictive design space exploration using genetically programmed response surfaces
Proceedings of the 45th annual Design Automation Conference
Journal of Systems Architecture: the EUROMICRO Journal
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 46th Annual Design Automation Conference
Cross-based local stereo matching using orthogonal integral images
IEEE Transactions on Circuits and Systems for Video Technology
Accurate and efficient processor performance prediction via regression tree based modeling
Journal of Systems Architecture: the EUROMICRO Journal
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Architecture performance prediction using evolutionary artificial neural networks
Evo'08 Proceedings of the 2008 conference on Applications of evolutionary computing
A Safari Through the MPSoC Run-Time Management Jungle
Journal of Signal Processing Systems
A correlation-based design space exploration methodology for multi-processor systems-on-chip
Proceedings of the 47th Design Automation Conference
Proceedings of the Conference on Design, Automation and Test in Europe
Fast multidimension multichoice knapsack heuristic for MP-SoC runtime management
ACM Transactions on Embedded Computing Systems (TECS)
High-accuracy stereo depth maps using structured light
CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
IEEE Transactions on Evolutionary Computation
Single- and multiobjective evolutionary optimization assisted by Gaussian random field metamodels
IEEE Transactions on Evolutionary Computation
System-level design: orthogonalization of concerns and platform-based design
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
Open Computing Language (OpenCL) is emerging as a standard for parallel programming of heterogeneous hardware accelerators. With respect to device specific languages, OpenCL enables application portability but does not guarantee performance portability, eventually requiring additional tuning of the implementation to a specific platform or to unpredictable dynamic workloads. In this paper, we present a methodology to analyze the customization space of an OpenCL application in order to improve performance portability and to support dynamic adaptation. We formulate our case study by implementing an OpenCL image stereo-matching application (which computes the relative depth of objects from a pair of stereo images) customized to the STMicroelectronics Platform 2012 many-core computing fabric. In particular, we use design space exploration techniques to generate a set of operating points that represent specific configurations of the parameters allowing different trade-offs between performance and accuracy of the algorithm itself. These points give detailed knowledge about the interaction between the application parameters, the underlying architecture and the performance of the system; they could also be used by a run-time manager software layer to meet dynamic Quality-of-Service (QoS) constraints. To analyze the customization space, we use cycle-accurate simulations for the target architecture. Since the profiling phase of each configuration takes a long simulation time, we designed our methodology to reduce the overall number of simulations by exploiting some important features of the application parameters; our analysis also enables the identification of the parameters that could be explored on a high-level simulation model to reduce the simulation time. The resulting methodology is one order of magnitude more efficient than an exhaustive exploration and, given its randomized nature, it increases the probability to avoid sub-optimal trade-offs.