Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A novel renaming mechanism that boosts software prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Media Processing Applications on the Imagine Stream Processor
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
A programming system for the imagine media processor
A programming system for the imagine media processor
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Impulse: Memory system support for scientific applications
Scientific Programming
Data and Computation Transformations for Brook Streaming Applications on Multiprocessors
Proceedings of the International Symposium on Code Generation and Optimization
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Stream execution on wide-issue clustered VLIW architectures
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Streamware: programming general-purpose multicore processors using streams
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Optimizing scientific application loops on stream processors
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
A lightweight streaming layer for multicore execution
ACM SIGARCH Computer Architecture News
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
MPSoC Design Using Application-Specific Architecturally Visible Communication
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Push-assisted migration of real-time tasks in multi-core processors
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Applying the Stream-Based Computing Model to Design Hardware Accelerators: A Case Study
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Flexible filters: load balancing through backpressure for stream programs
EMSOFT '09 Proceedings of the seventh ACM international conference on Embedded software
Streaming HD H.264 encoder on programmable processors
MM '09 Proceedings of the 17th ACM international conference on Multimedia
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Input-driven dynamic execution prediction of streaming applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Design and implementation of a graphical user interface for stream-based distributed computing
PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
An analytical model to exploit memory task scheduling
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Minimizing communication in rate-optimal software pipelining for stream programs
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Stream image processing on a dual-core embedded system
SAMOS'07 Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation
An MPI-Stream Hybrid Programming Model for Computational Clusters
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Memory Latency Reduction via Thread Throttling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Log-based architectures: using multicore to help software behave correctly
ACM SIGOPS Operating Systems Review
Orchestration by approximation: mapping stream programs onto multicore architectures
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Inter-core prefetching for multicore processors using migrating helper threads
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Streaming-oriented parallelization of domain-independent irregular kernels
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Parallel processing on real-time gesture recognition system
Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
ACM Transactions on Architecture and Code Optimization (TACO)
Mapping streaming languages to general purpose processors through vectorization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Adaptive task duplication using on-line bottleneck detection for streaming applications
Proceedings of the 9th conference on Computing Frontiers
Profile-guided deployment of stream programs on multicores
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
StreamPI: a stream-parallel programming extension for object-oriented programming languages
The Journal of Supercomputing
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Flexible filters in stream programs
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
In this paper we investigate mapping stream programs (i.e., programs written in a streaming style for streaming architectures such as Imagine and Raw) onto a general-purpose CPU. We develop and explore a novel way of mapping these programs onto the CPU. We show how the salient features of stream programming such as computation kernels, local memories, and asynchronous bulk memory loads and stores can be easily mapped by a simple compilation system to CPU features such as the processor caches, simultaneous multi-threading, and fast inter-thread communication support, resulting in an executable that efficiently uses CPU resources. We present an evaluation of our mapping on a hyperthreaded Intel Pentium 4 CPU as a canonical example of a general-purpose processor. We compare the mapped stream program against the same program coded in a more conventional style for the general-purpose processor. Using both micro-benchmarks and scientific applications we show that programs written in a streaming style can run comparably to equivalent programs written in a traditional style. Our results show that coding programs in a streaming style can improve performance on today驴s machines and smooth the way for significant performance improvements with the deployment of streaming architectures.