MAPS: an integrated framework for MPSoC application parallelization
Proceedings of the 45th annual Design Automation Conference
SoC-C: efficient programming abstractions for heterogeneous multicore systems on chip
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
MPSoC Design Using Application-Specific Architecturally Visible Communication
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Profiling Java programs for parallelism
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Low-power inter-core communication through cache partitioning in embedded multiprocessors
Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
Searching for Concurrent Design Patterns in Video Games
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Input-driven dynamic execution prediction of streaming applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Speculative parallelization using software multi-threaded transactions
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
A profile-based tool for finding pipeline parallelism in sequential programs
Parallel Computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
The Paralax infrastructure: automatic parallelization with a helping hand
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Lime: a Java-compatible and synthesizable language for heterogeneous architectures
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
ACM Transactions on Design Automation of Electronic Systems (TODAES)
RMOT: recursion in model order for task execution time estimation in a software pipeline
Proceedings of the Conference on Design, Automation and Test in Europe
Resource recycling: putting idle resources to work on a composable accelerator
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Exposing tunable parameters in multi-threaded numerical code
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
SD3: A Scalable Approach to Dynamic Data-Dependence Profiling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Scalable Speculative Parallelization on Commodity Clusters
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
ALTER: exploiting breakable dependences for parallelization
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parallel pattern detection for architectural improvements
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Safe parallel programming using dynamic dependence hints
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part II
Programmer-assisted automatic parallelization
Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Expressing pipeline parallelism using TBB constructs: a case study on what works and what doesn't
Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Single thread program parallelism with dataflow abstracting thread
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Characteristics of workloads using the pipeline programming model
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
HELIX: automatic parallelization of irregular programs for chip multiprocessing
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Fast loop-level data dependence profiling
Proceedings of the 26th ACM international conference on Supercomputing
Multi-slicing: a compiler-supported parallel approach to data dependence profiling
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Automatic generation of software pipelines for heterogeneous parallel systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Sigma*: symbolic learning of input-output specifications
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
General data structure expansion for multi-threading
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Pipelets: self-organizing software pipelines for many-core architectures
Proceedings of the Conference on Design, Automation and Test in Europe
Proceedings of the 6th International Systems and Storage Conference
Runtime resource allocation for software pipelines
Proceedings of the 16th International Workshop on Software and Compilers for Embedded Systems
On-the-fly pipeline parallelism
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Optimizations for configuring and mapping software pipelines in many core systems
Proceedings of the 50th Annual Design Automation Conference
A catalog of stream processing optimizations
ACM Computing Surveys (CSUR)
An automatic thread decomposition approach for pipelined multithreading
International Journal of High Performance Computing and Networking
Journal of Systems Architecture: the EUROMICRO Journal
HEAP: A Highly Efficient Adaptive multi-Processor framework
Microprocessors & Microsystems
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Accelerating sequential programs on commodity multi-core processors
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The emergence of multicore processors has heightened the need for effective parallel programming practices. In addition to writing new parallel programs, the next gener- ation of programmers will be faced with the overwhelming task of migrating decades' worth of legacy C code into a parallel representation. Addressing this problem requires a toolset of parallel programming primitives that can broadly apply to both new and existing programs. While tools such as threads and OpenMP allow programmers to express task and data parallelism, support for pipeline parallelism is distinctly lacking. In this paper, we offer a new and pragmatic approach to leveraging coarse-grained pipeline parallelism in C pro- grams. We target the domain of streaming applications, such as audio, video, and digital signal processing, which exhibit regular flows of data. To exploit pipeline paral- lelism, we equip the programmer with a simple set of an- notations (indicating pipeline boundaries) and a dynamic analysis that tracks all communication across those bound- aries. Our analysis outputs a stream graph of the applica- tion as well as a set of macros for parallelizing the program and communicating the data needed. We apply our method- ology to six case studies, including MPEG-2 decoding, MP3 decoding, GMTI radar processing, and three SPEC bench- marks. Our analysis extracts a useful block diagram for each application, and the parallelized versions offer a 2.78x mean speedup on a 4-core machine.