Communications of the ACM - Special section on computer architecture
The Manchester prototype dataflow computer
Communications of the ACM - Special section on computer architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Effect of storage allocation/reclamation methods on parallelism and storage requirements
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Characterizations of parallelism in applications and their use in scheduling
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Compiler algorithms for event variable synchronization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Another view on parallel speedup
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The Processor Working Set and its Use in Scheduling Multiprocessor Systems
IEEE Transactions on Software Engineering
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Scheduling parallel programs with non-uniform parallelism profiles
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Dynamic dependency analysis of ordinary programs
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
On the limits of program parallelism and its smoothability
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Extraction of massive instruction level parallelism
ACM SIGARCH Computer Architecture News
Array privatization for shared and distributed memory machines (extended abstract)
ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Static and dynamic evaluation of data dependence analysis
ICS '93 Proceedings of the 7th international conference on Supercomputing
On multiprocessor system scheduling
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Static and Dynamic Evaluation of Data Dependence Analysis Techniques
IEEE Transactions on Parallel and Distributed Systems
Measuring limits of parallelism and characterizing its vulnerability to resource constraints
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Preemptive scheduling of parallel jobs on multiprocessors
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Source-to-Source Instrumentation for the Optimization of an Automatic Reading System
The Journal of Supercomputing
Architectural differences of efficient sequential and parallel computers
Journal of Systems Architecture: the EUROMICRO Journal
Cost and Time-Cost Effectiveness of Multiprocessing
IEEE Transactions on Parallel and Distributed Systems
Loop-Level Parallelism in Numeric and Symbolic Programs
IEEE Transactions on Parallel and Distributed Systems
An Architecture-Independent Workload Characterization Model for Parallel Computer Architectures
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
On-line scheduling of scalable real-time tasks on multiprocessor systems
Journal of Parallel and Distributed Computing
Parallel program performance prediction using deterministic task graph analysis
ACM Transactions on Computer Systems (TOCS)
Non-clair voy ant multiprocessor scheduling of jobs with changing execution characteristics
Journal of Scheduling - Special issue: On-line scheduling
The impact of x86 instruction set architecture on superscalar processing
Journal of Systems Architecture: the EUROMICRO Journal
Journal of Parallel and Distributed Computing
$P$^$3$$T+$: A performance estimator for distributed and parallel programs
Scientific Programming
Quantifying ILP by means of graph theory
Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
Paper: Toward a better parallel performance metric
Parallel Computing
Study of Algorithmic and Architectural Characteristics of Gaussian Particle Filters
Journal of Signal Processing Systems
Kremlin: like gprof, but for parallelization
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Energy-efficient scheduling for parallel real-time tasks based on level-packing
Proceedings of the 2011 ACM Symposium on Applied Computing
Kremlin: rethinking and rebooting gprof for the multicore age
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Parkour: parallel speedup estimates for serial programs
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Kismet: parallel speedup estimates for serial programs
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Parallelization of utility programs based on behavior phase analysis
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Dynamic trace-based analysis of vectorization potential of applications
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
Vector seeker: a tool for finding vector potential
Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing
Hi-index | 14.98 |
Describes COMET, (concurrency measurement tool), a software tool for measuring parallelism in large scientific/engineering applications. The proposed tool measures the total parallelism present in programs, filtering out the effects of communication/synchronization delays, finite storage, limited number of processors, the policies for management of processors and storage, etc. Although an ideal machine that can exploit the total parallelism is not realizable, such measures would aid the calibration and design of various architectures/compilers. The proposed software tool accepts ordinary Fortran programs as input. Therefore, parallelism can be measured easily on many fairly big programs. Some measurements for parallelism obtained with the help of this tool are also reported. It is observed that the average parallelism in the chosen programs is in the range of 500-3500 Fortran statements executing concurrently in each clock cycle in an idealized environment.