Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A stream compiler for communication-exposed architectures
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
StreamIt: A Language for Streaming Applications
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Stream Programming on General-Purpose Processors
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
DRAMsim: a memory system simulator
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiling for stream processing
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Dryad: distributed data-parallel programs from sequential building blocks
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Enabling scalability and performance in a large scale CMP environment
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Architectural Support for the Stream Execution Model on General-Purpose Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A portable runtime interface for multi-level memory hierarchies
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
A comparison of programming models for multiprocessors with explicitly managed memory hierarchies
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
SLIPstream: scalable low-latency interactive perception on streaming data
Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video
The canals language and its compiler
Proceedings of th 12th International Workshop on Software and Compilers for Embedded Systems
An analytical model to exploit memory task scheduling
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Storing and accessing live mashup content in the cloud
ACM SIGOPS Operating Systems Review
On-chip communication and synchronization mechanisms with cache-integrated network interfaces
Proceedings of the 7th ACM international conference on Computing frontiers
Feedback-directed pipeline parallelism
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transformation-based parallelization of request-processing applications
MODELS'10 Proceedings of the 13th international conference on Model driven engineering languages and systems: Part II
Memory Latency Reduction via Thread Throttling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Streaming Data Movement for Real-Time Image Analysis
Journal of Signal Processing Systems
Optimizing modulo scheduling to achieve reuse and concurrency for stream processors
The Journal of Supercomputing
ACM Transactions on Architecture and Code Optimization (TACO)
Mapping streaming languages to general purpose processors through vectorization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Adaptive task duplication using on-line bottleneck detection for streaming applications
Proceedings of the 9th conference on Computing Frontiers
DAG3: a tool for design and analysis of applications for multicore architectures
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Optimization schemes and performance evaluation of Smith–Waterman algorithm on CPU, GPU and FPGA
Concurrency and Computation: Practice & Experience
Sigma*: symbolic learning of input-output specifications
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Prefetching and cache management using task lifetimes
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
Recently, the number of cores on general-purpose processors has been increasing rapidly. Using conventional programming models, it is challenging to effectively exploit these cores for maximal performance. An interesting alternative candidate for programming multiple cores is the stream programming model, which provides a framework for writing programs in a sequential-style while greatly simplifying the task of automatic parallelization. It has been shown that not only traditional media/image applications but also more general-purpose data-intensive applications can be expressed in the stream programming style. In this paper, we investigate the potential to use the stream programming model to efficiently utilize commodity multicore general-purpose processors (e.g., Intel/AMD). Although several stream languages and stream compilers have recently been developed, they typically target special-purpose stream processors. In contrast, we propose a flexible software system, Streamware, which automatically maps stream programs onto a wide variety of general-purpose multicore processor configurations. We leverage existing compilation framework for stream processors and design a runtime environment which takes as input the output of these stream compilers in the form of machine-independent stream virtual machine code. The runtime environment assigns work to processor cores considering processor/cache configurations and adapts to workload variations. We evaluate this approach for a few general-purpose scientific applications on real hardware and a cycle-level simulator set-up to showcase scaling and contention issues. The results show that the stream programming model is a good choice for efficiently exploiting modern and future multicore CPUs for an important class of applications.