Compile-time partitioning and scheduling of parallel programs
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Data flow analysis for `intractable' system software
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Gated SSA-based demand-driven symbolic analysis for parallelizing compilers
ICS '95 Proceedings of the 9th international conference on Supercomputing
Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Solving shape-analysis problems in languages with destructive updating
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Using static single assignment form to improve flow-insensitive pointer analysis
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Compilation techniques for parallel systems
Parallel Computing - Special Anniversary issue
On the importance of points-to analysis and other memory disambiguation methods for C programs
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Pointer analysis: haven't we solved this problem yet?
PASTE '01 Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Efficient and precise array access analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallelizing Programs with Recursive Data Structures
IEEE Transactions on Parallel and Distributed Systems
Interactive Parallel Programming using the ParaScope Editor
IEEE Transactions on Parallel and Distributed Systems
The I Test: An Improved Dependence Test for Automatic Parallelization and Vectorization
IEEE Transactions on Parallel and Distributed Systems
Dynamically Adaptive Parallel Programs
ISHPC '99 Proceedings of the Second International Symposium on High Performance Computing
Importance of heap specialization in pointer analysis
Proceedings of the 5th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications
ACM Transactions on Architecture and Code Optimization (TACO)
Interprocedural parallelization analysis in SUIF
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler Analysis of the Value Ranges for Variables
IEEE Transactions on Software Engineering
Implicitly parallel programming models for thousand-core microprocessors
Proceedings of the 44th annual Design Automation Conference
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
With the increasing use of multi-core microprocessors and hardware accelerators in embedded media processing systems, there is an increasing need to discover coarse-grained parallelism in media applications written in C and C++. Common versions of these codes use a pointer-heavy, sequential programming model to implement algorithms with high levels of inherent parallelism. The lack of automated tools capable of discovering this parallelism has hampered the productivity of parallel programmers and application-specific hardware designers, as well as inhibited the development of automatic parallelizing compilers. Automatic discovery is challenging due to shifts in the prevalent programming languages, scalability problems of analysis techniques, and the lack of experimental research in combining the numerous analyses necessary to achieve a clear view of the relations among memory accesses in complex programs. This paper is based on a coherent prototype system designed to automatically find multiple levels of coarse-grained parallelism. It visits several of the key analyses that are necessary to discover parallelism in contemporary media applications, distinguishing those that perform satisfactorily at this time from those that do not yet have practical, scalable solutions. We show that, contrary to common belief, a compiler with a strong, synergistic portfolio of modern analysis capabilities can automatically discover a very substantial amount of coarse-grained parallelism in complex media applications such as an MPEG-4 encoder. These results suggest that an automatic coarse-grained parallelism discovery tool can be built to greatly enhance the software and hardware development processes of future embedded media processing systems.