A fast static scheduling algorithm for DAGs on an unbounded number of processors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
DiP: A Parallel Program Development Environment
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
Simulation of cloud dynamics on graphics hardware
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
GPU Cluster for High Performance Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell
Journal of VLSI Signal Processing Systems
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
SuperMatrix: a multithreaded runtime scheduling system for algorithms-by-blocks
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine
Proceedings of the 5th conference on Computing frontiers
Dma-based prefetching for i/o-intensive workloads on the cell architecture
Proceedings of the 5th conference on Computing frontiers
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
A mapping framework for guided design space exploration of heterogeneous MP-SoCs
Proceedings of the conference on Design, automation and test in Europe
Parallel exact inference on the cell broadband engine processor
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Loading OpenMP to Cell: An Effective Compiler Framework for Heterogeneous Multi-core Chip
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
A Real-Time Programming Model for Heterogeneous MPSoCs
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Fine-Grained Task Scheduling Using Adaptive Data Structures
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A Constraint Programming Approach for Allocation and Scheduling on the CELL Broadband Engine
CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
International Journal of Parallel Programming
SPENK: adding another level of parallelism on the cell broadband engine
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
OpenMP tasks in IBM XL compilers
CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Solving dense linear systems on platforms with multiple hardware accelerators
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization
High Performance Computing for Computational Science - VECPAR 2008
Predictive Runtime Code Scheduling for Heterogeneous Architectures
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Mapping and Synchronizing Streaming Applications on Cell Processors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
QR factorization for the Cell Broadband Engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Available task-level parallelism on the Cell BE
Scientific Programming - High Performance Computing with the Cell Broadband Engine
CellSs: Scheduling techniques to better exploit memory hierarchy
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Towards automatic program partitioning
Proceedings of the 6th ACM conference on Computing frontiers
Scheduling dynamic parallelism on accelerators
Proceedings of the 6th ACM conference on Computing frontiers
Compile-Time and Run-Time Issues in an Auto-Parallelisation System for the Cell BE Processor
Euro-Par 2008 Workshops - Parallel Processing
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
A Mapping Framework Based on Packing for Design Space Exploration of Heterogeneous MPSoCs
Journal of Signal Processing Systems
Hierarchical Task-Based Programming With StarSs
International Journal of High Performance Computing Applications
Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Building heterogeneous reconfigurable systems with a hardware microkernel
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Towards a framework for abstracting accelerators in parallel applications: experience with cell
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A proposal to extend the OpenMP tasking model with dependent tasks
International Journal of Parallel Programming
An asymmetric distributed shared memory model for heterogeneous parallel systems
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Parallel exact inference on the Cell Broadband Engine processor
Journal of Parallel and Distributed Computing
A streaming machine description and programming model
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
CG-Cell: an NPB benchmark implementation on cell broadband engine
ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Drug design issues on the cell BE
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Extending the OpenMP tasking model to allow dependent tasks
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
State-of-the-art in heterogeneous computing
Scientific Programming
Scheduling two-sided transformations using tile algorithms on multicore architectures
Scientific Programming
Bamboo: a data-centric, object-oriented approach to many-core software
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Overlapping communication and computation by using a hybrid MPI/SMPSs approach
Proceedings of the 24th ACM International Conference on Supercomputing
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
The reverse-acceleration model for programming petascale hybrid systems
IBM Journal of Research and Development
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Comparing last-level cache designs for CMP architectures
Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Optimization of FDTD computations in a streaming model architecture
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Automatic calibration of performance models on heterogeneous multicore architectures
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Exploiting fine-grained parallelism on cell processors
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Starsscheck: a tool to find errors in task-based parallel programs
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Monte Carlo implementation of financial simulation on Cell/B.E. multi-core processor
Mathematics and Computers in Simulation
Compiler-directed memory management for heterogeneous MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Task Superscalar: An Out-of-Order Task Pipeline
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
OoOJava: software out-of-order execution
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
A stream-computing extension to OpenMP
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
DDM-VMc: the data-driven multithreading virtual machine for the cell processor
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Cost-aware function migration in heterogeneous systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
The Journal of Supercomputing
Transactions on high-performance embedded architectures and compilers III
International Journal of Communication Networks and Distributed Systems
Parallel programming of general-purpose programs using task-based programming models
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
MPOpt-Cell: a high-performance data-flow programming environment for the CELL BE processor
Proceedings of the 8th ACM International Conference on Computing Frontiers
Automatic analysis of DMA races using model checking and k-induction
Formal Methods in System Design
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
On the simulation of large-scale architectures using multiple application abstraction levels
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
A comparison of three commodity-level parallel architectures: multi-core CPU, cell BE and GPU
MMCS'08 Proceedings of the 7th international conference on Mathematical Methods for Curves and Surfaces
Reflex: using low-power processors in smartphones without knowing them
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Automatic data distribution for improving data locality on the cell BE architecture
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Offload – automating code migration to heterogeneous multicore systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Support for OpenMP tasks on cell architecture
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Streaming model computation of the FDTD problem
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Mapping a data-flow programming model onto heterogeneous platforms
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Exploring heterogeneous scheduling using the task-centric programming model
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs
ACM Transactions on Architecture and Code Optimization (TACO)
Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Hi-index | 0.00 |
In this work we present Cell superscalar (CellSs) which addresses the automatic exploitation of the functional parallelism of a sequential program through the different processing elements of the Cell BE architecture. The focus in on the simplicity and flexibility of the programming model. Based on a simple annotation of the source code, a source to source compiler generates the necessary code and a runtime library exploits the existing parallelism by building at runtime a task dependency graph. The runtime takes care of the task scheduling and data handling between the different processors of this heterogeneous architecture. Besides, a locality-aware task scheduling has been implemented to reduce the overhead of data transfers. The approach has been implemented and tested with a set of examples and the results obtained since now are promising.