Evaluating Software Complexity Measures
IEEE Transactions on Software Engineering
Elements of Software Science (Operating and programming systems series)
Elements of Software Science (Operating and programming systems series)
A Tutorial on Support Vector Machines for Pattern Recognition
Data Mining and Knowledge Discovery
On the Efficiency of Reductions in µ-SIMD Media Extensions
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
Recent advances in visual and infrared face recognition: a review
Computer Vision and Image Understanding
Heterogeneous Chip Multiprocessors
Computer
Optimizing data permutations for SIMD devices
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Multimedia vectorization of floating-point MIN/MAX reductions: Research Articles
Concurrency and Computation: Practice & Experience
Intel threading building blocks
Intel threading building blocks
GpuCV: A GPU-Accelerated Framework for Image Processing and Computer Vision
ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing, Part II
Predictive Runtime Code Scheduling for Heterogeneous Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
RAPL: memory power estimation and capping
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Object Detection with Discriminatively Trained Part-Based Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Generic Algorithm Template for Divide-and-Conquer in Multicore Systems
HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Analyzing program flow within a many-kernel OpenCL application
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Automatic OpenCL device characterization: guiding optimized kernel design
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11 Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing
Performance and Power Analysis of ATI GPU: A Statistical Approach
NAS '11 Proceedings of the 2011 IEEE Sixth International Conference on Networking, Architecture, and Storage
An Evaluation of Vectorizing Compilers
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
CVPR '11 Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition
Pedestrian Detection: An Evaluation of the State of the Art
IEEE Transactions on Pattern Analysis and Machine Intelligence
The tradeoffs of fused memory hierarchies in heterogeneous computing architectures
Proceedings of the 9th conference on Computing Frontiers
The yin and yang of power and performance for asymmetric hardware and managed software
Proceedings of the 39th Annual International Symposium on Computer Architecture
A data driven method for feature transformation
CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Optimizing Option Pricing Algorithms and Profiling Power Consumption on VLIW APU Architecture
ISPA '12 Proceedings of the 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications
Power-efficient time-sensitive mapping in heterogeneous systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Accuracy and Power Consumption Tradeoffs in Video Rate Adaptation for Computer Vision Applications
ICME '12 Proceedings of the 2012 IEEE International Conference on Multimedia and Expo
GreenGPU: A Holistic Approach to Energy Efficiency in GPU-CPU Heterogeneous Architectures
ICPP '12 Proceedings of the 2012 41st International Conference on Parallel Processing
Self-Adaptive OmpSs Tasks in Heterogeneous Environments
IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Hi-index | 0.00 |
We optimize a visual object detection application (that uses Vision Video Library kernels) and show that OpenCL is a unified programming paradigm that can provide high performance when running on the Ivy Bridge heterogeneous on-chip architecture. We evaluate different mapping techniques and show that running each kernel where it fits the best and using software pipelining can provide 1.91 times higher performance and 42% better energy efficiency. We also show how to trade accuracy for energy at runtime. Overall, our application can perform accurate object detection at 40 frames per second (fps) in an energy-efficient manner.