Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Chores: enhanced run-time support for shared-memory parallel computing
ACM Transactions on Computer Systems (TOCS)
Parallel programming in Split-C
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Recursion leads to automatic variable blocking for dense linear-algebra algorithms
IBM Journal of Research and Development
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An annotation language for optimizing software libraries
Proceedings of the 2nd conference on Domain-specific languages
Blocking and array contraction across arbitrarily nested loops using affine partitioning
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
Program Structuring for Effective Parallel Portability
IEEE Transactions on Parallel and Distributed Systems
Handbook of massive data sets
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Space-limited procedures: a methodology for portable high-performance
PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
A programming system for the imagine media processor
A programming system for the imagine media processor
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Cache oblivious stencil computations
Proceedings of the 19th annual international conference on Supercomputing
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
ClawHMMER: A Streaming HMMer-Search Implementatio
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Programming for parallelism and locality with hierarchically tiled arrays
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Compilation for explicitly managed memory hierarchies
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Exploring New Search Algorithms and Hardware for Phylogenetics: RAxML Meets the IBM Cell
Journal of VLSI Signal Processing Systems
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
CellSs: making it easier to program the cell broadband engine processor
IBM Journal of Research and Development
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
A portable runtime interface for multi-level memory hierarchies
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Merge: a programming model for heterogeneous multi-core systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Data exploration of turbulence simulations using a database cluster
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Cell-SWat: modeling and scheduling wavefront computations on the cell broadband engine
Proceedings of the 5th conference on Computing frontiers
Dma-based prefetching for i/o-intensive workloads on the cell architecture
Proceedings of the 5th conference on Computing frontiers
Visions for application development on hybrid computing systems
Parallel Computing
Orchestrating data transfer for the cell/B.E. processor
Proceedings of the 22nd annual international conference on Supercomputing
Efficient computation of sum-products on GPUs through software-managed cache
Proceedings of the 22nd annual international conference on Supercomputing
Harmony: an execution model and runtime for heterogeneous many core systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
International Journal of Parallel, Emergent and Distributed Systems
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
A Buffered-Mode MPI Implementation for the Cell BETM Processor
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
A Real-Time Programming Model for Heterogeneous MPSoCs
SAMOS '08 Proceedings of the 8th international workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Stream Scheduling: A Framework to Manage Bulk Operations in Memory Hierarchies
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Radioastronomy Image Synthesis on the Cell/B.E.
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Languages and Compilers for Parallel Computing
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
SPENK: adding another level of parallelism on the cell broadband engine
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Certified Reasoning in Memory Hierarchies
APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Deriving Efficient Data Movement from Decoupled Access/Execute Specifications
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Vector stream processing for effective application of heterogeneous parallelism
Proceedings of the 2009 ACM symposium on Applied Computing
Celling SHIM: compiling deterministic concurrency to a heterogeneous multicore
Proceedings of the 2009 ACM symposium on Applied Computing
Scheduling dynamic parallelism on accelerators
Proceedings of the 6th ACM conference on Computing frontiers
Evaluating multi-core platforms for HPC data-intensive kernels
Proceedings of the 6th ACM conference on Computing frontiers
Compile-Time and Run-Time Issues in an Auto-Parallelisation System for the Cell BE Processor
Euro-Par 2008 Workshops - Parallel Processing
A Unified Runtime System for Heterogeneous Multi-core Architectures
Euro-Par 2008 Workshops - Parallel Processing
DBDB: optimizing DMATransfer for the cell be architecture
Proceedings of the 23rd international conference on Supercomputing
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
Hierarchical Task-Based Programming With StarSs
International Journal of High Performance Computing Applications
Exploiting the Cell/BE Architecture with the StarPU Unified Runtime System
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Exposing non-standard architectures to embedded software using compile-time virtualisation
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Co-processor acceleration of an unmodified parallel solid mechanics code with FEASTGPU
International Journal of Computational Science and Engineering
Towards a framework for abstracting accelerators in parallel applications: experience with cell
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Design and use of htalib: a library for hierarchically tiled arrays
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
State-of-the-art in heterogeneous computing
Scientific Programming
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Data-aware scheduling of legacy kernels on heterogeneous platforms with distributed memory
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Low depth cache-oblivious algorithms
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
A locality model for the real-time specification for Java
Proceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A balanced programming model for emerging heterogeneous multicore systems
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Task superscalar: using processors as functional units
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Efficient OpenMP data mapping for multicore platforms with vertically stacked memory
Proceedings of the Conference on Design, Automation and Test in Europe
Recursion-driven parallel code generation for multi-core platforms
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Towards metaprogramming for parallel systems on a chip
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Automatic calibration of performance models on heterogeneous multicore architectures
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
Compiler-directed memory management for heterogeneous MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Task Superscalar: An Out-of-Order Task Pipeline
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Communications of the ACM
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
SpiceC: scalable parallelism via implicit copying and explicit commit
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Accelerating CUDA graph algorithms at maximum warp
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
DDM-VMc: the data-driven multithreading virtual machine for the cell processor
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Sponge: portable stream programming on graphics engines
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A parallel numerical solver using hierarchically tiled arrays
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Unified parallel C for GPU clusters: language extensions and compiler implementation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies
The Journal of Supercomputing
Programming heterogeneous clusters with accelerators using object-based programming
Scientific Programming
Transactions on high-performance embedded architectures and compilers III
Improving programmability of heterogeneous many-core systems via explicit platform descriptions
Proceedings of the 4th International Workshop on Multicore Software Engineering
A programming model for deterministic task parallelism
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Scalable heterogeneous parallelism for atmospheric modeling and simulation
The Journal of Supercomputing
A runtime implementation of OpenMP tasks
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Automatic analysis of DMA races using model checking and k-induction
Formal Methods in System Design
Green challenges to system software in data centers
Frontiers of Computer Science in China
A fast, GPU based, dictionary attack to OpenPGP secret keyrings
Journal of Systems and Software
Obsidian: a domain specific embedded language for parallel programming of graphics processors
IFL'08 Proceedings of the 20th international conference on Implementation and application of functional languages
Optimizing explicit data transfers for data parallel applications on the cell architecture
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Programmable data dependencies and placements
DAMP '12 Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming
DOJ: dynamically parallelizing object-oriented programs
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
BDDT:: block-level dynamic dependence analysisfor deterministic task-based parallelism
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Hierarchical place trees: a portable abstraction for task parallelism and data movement
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Offload – automating code migration to heterogeneous multicore systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Towards a codelet-based runtime for exascale computing: position paper
Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Decoupling algorithms from schedules for easy optimization of image processing pipelines
ACM Transactions on Graphics (TOG) - SIGGRAPH 2012 Conference Proceedings
The myrmics memory allocator: hierarchical,message-passing allocation for global address spaces
Proceedings of the 2012 international symposium on Memory Management
Integrated in-system storage architecture for high performance computing
Proceedings of the 2nd International Workshop on Runtime and Operating Systems for Supercomputers
CHARM: a composable heterogeneous accelerator-rich microprocessor
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Extendable pattern-oriented optimization directives
ACM Transactions on Architecture and Code Optimization (TACO)
A multi-objective auto-tuning framework for parallel codes
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Legion: expressing locality and independence with logical regions
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Designing a unified programming model for heterogeneous machines
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Scheduling streaming applications on a complex multicore platform
Concurrency and Computation: Practice & Experience
Parallelization strategies for the points of interests algorithm on the cell processor
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
A synchronous mode MPI implementation on the cell BETM architecture
ISPA'07 Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
The Journal of Supercomputing
Abstractions and Middleware for Petascale Computing and Beyond
International Journal of Distributed Systems and Technologies
Work-stealing with configurable scheduling strategies
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Implementing OmpSs support for regions of data in architectures with multiple address spaces
Proceedings of the 27th international ACM conference on International conference on supercomputing
Locality-aware task management for unstructured parallelism: a quantitative limit study
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Language support for dynamic, hierarchical data partitioning
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Tomahawk: Parallelism and heterogeneity in communications signal processing MPSoCs
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Analysis of Recursively Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
RSVM: a region-based software virtual memory for GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Optimizing two-dimensional DMA transfers for scratchpad Based MPSoCs platforms
Microprocessors & Microsystems
Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.02 |
We present Sequoia, a programming language designed to facilitate the development of memory hierarchy aware parallel programs that remain portable across modern machines featuring different memory hierarchy configurations. Sequoia abstractly exposes hierarchical memory in the programming model and provides language mechanisms to describe communication vertically through the machine and to localize computation to particular memory locations within it. We have implemented a complete programming system, including a compiler and runtime systems for Cell processor-based blade systems and distributed memory clusters, and demonstrate efficient performance running Sequoia programs on both of these platforms.