An Enabling Framework for Master-Worker Applications on the Computational Grid
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Dynamic multigrain parallelization on the cell broadband engine
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
From single core to multi-core: preparing for a new exponential
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
A parallel dynamic programming algorithm on a multi-core architecture
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Thousand core chips: a technology perspective
Proceedings of the 44th annual Design Automation Conference
Design and Implementation of a Real-Time Video Player on Tiled-Display System
CIT '07 Proceedings of the 7th IEEE International Conference on Computer and Information Technology
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Application mapping for chip multiprocessors
Proceedings of the 45th annual Design Automation Conference
Achieving predictable performance through better memory controller placement in many-core CMPs
Proceedings of the 36th annual international symposium on Computer architecture
Scheduling Concurrent Bag-of-Tasks Applications on Heterogeneous Platforms
IEEE Transactions on Computers
Hierarchical master-worker skeletons
PADL'08 Proceedings of the 10th international conference on Practical aspects of declarative languages
An Architecture for Distributed High Performance Video Processing in the Cloud
CLOUD '10 Proceedings of the 2010 IEEE 3rd International Conference on Cloud Computing
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Mapping of H.264/AVC Encoder on a Hierarchical Chip Multicore DSP Platform
HPCC '10 Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications
Hierarchical Load Balancing for Charm++ Applications on Large Supercomputers
ICPPW '10 Proceedings of the 2010 39th International Conference on Parallel Processing Workshops
Parallelization of motion JPEG decoder on TILE64 many-core platform
MTPP'10 Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers
A Novel Macro-Block Group Based AVS Coding Scheme for Many-Core Processor
Journal of Signal Processing Systems
Remote store programming: a memory model for embedded multicore
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Spatial and temporal data parallelization of the H.261 video coding algorithm
IEEE Transactions on Circuits and Systems for Video Technology
Enabling large-scale scientific workflows on petascale resources using MPI master/worker
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Hi-index | 0.00 |
Advances at an unprecedented rate in computer hardware and networking technologies have made the many-core computing affordable and readily available in a matter of few years. Nonetheless, it incurs challenges to programmers to build scalable parallel software. Optimizations of parallel programs for a many-core platform are viewed as a multifaceted problem, where system and architectural factors should be taken into account. In this paper, we tackle this problem by implementing parallel programs with different available programming paradigms and evaluate application behaviors on TILE64 many-core platform. That is, we investigate a hybrid producer-write plus consumer-read shared memory programming paradigm for the implementation of master---worker video decoder and encoder in the referred many-core platform. Experimental results show that the proposed implementation has achieved competitive performance speedup, scaling well with the number of available cores and up to four times of performance improvement over other implementations on the decoding of sample 1080P video.