Communications of the ACM - Special section on computer architecture
Data structures and network algorithms
Data structures and network algorithms
Principles of interactive computer graphics (2nd ed.)
Principles of interactive computer graphics (2nd ed.)
An O(n2 log n) parallel max-flow algorithm
Journal of Algorithms
The connection machine
Random-Access Stored-Program Machines, an Approach to Programming Languages
Journal of the ACM (JACM)
Journal of the ACM (JACM)
A universal interconnection pattern for parallel computers
Journal of the ACM (JACM)
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Computing connected components on parallel computers
Communications of the ACM
Sorting on a mesh-connected parallel computer
Communications of the ACM
Merging with parallel processors
Communications of the ACM
Parallelism in random access machines
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
The chip complexity of binary arithmetic
STOC '80 Proceedings of the twelfth annual ACM symposium on Theory of computing
Tight bounds on the complexity of parallel sorting
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
New bounds for parallel prefix circuits
STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
The Complexity of Parallel Computations
The Complexity of Parallel Computations
Fluent parallel computation
A programming language
A functional programming language compiler for massively parallel computers
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Program optimization and parallelization using idioms
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Parallel programming with coordination structures
POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scan primitives for vector computers
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
Radix sort for vector multiprocessors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Distributed computing with APL
APL '92 Proceedings of the international conference on APL
An equational language for data-parallelism
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Program optimization and parallelization using idioms
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallelizing complex scans and reductions
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
A comparison of parallel algorithms for connected components
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Efficient low-contention parallel algorithms
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Parallel solutions to geometric problems in the scan model of computation
Journal of Computer and System Sciences
Request Combining in Multiprocessors with Arbitrary Interconnection Networks
IEEE Transactions on Parallel and Distributed Systems
Are multiport memories physically feasible?
ACM SIGARCH Computer Architecture News - Special issue on input/output in parallel computer systems
Are multiport memories physically feasible?
ACM SIGARCH Computer Architecture News
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
Prefix Computations on a Generalized Mesh-Connected Computer with Multiple Buses
IEEE Transactions on Parallel and Distributed Systems
Flattening and parallelizing irregular, recurrent loop nests
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thoughts on parallelism and concurrency in compiling curricula
ACM Computing Surveys (CSUR)
Empirical study of parallel trace-driven LRU cache simulators
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Matrix inversion in O(log n) on a scan-enhanced reconfigurable mesh computer
CSC '96 Proceedings of the 1996 ACM 24th annual conference on Computer science
Superfast parallel discrete event simulations
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Detection and global optimization of reduction operations for distributed parallel machines
ICS '96 Proceedings of the 10th international conference on Supercomputing
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Deriving efficient parallel programs for complex recurrences
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
The Reconfigurable Ring of Processors: Fine-Grain Tree-Structured Computations
IEEE Transactions on Computers
Parallelization in calculational forms
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The Static Parallelization of Loops and Recursions
The Journal of Supercomputing - Special issue: high performance computing systems
Fast set operations using treaps
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Communication-optimal parallel minimum spanning tree algorithms (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Implementation of reductions in support of PDES on a network of workstations
PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
Provably efficient scheduling for languages with fine-grained parallelism
Journal of the ACM (JACM)
Using Emulations to Enhance the Performance of Parallel Architectures
IEEE Transactions on Parallel and Distributed Systems
A New Class of Depth-Size Optimal Parallel Prefix Circuits
The Journal of Supercomputing
Scalable Hardware-Algorithms for Binary Prefix Sums
IEEE Transactions on Parallel and Distributed Systems
Constructing H4, a Fast Depth-Size Optimal Parallel Prefix Circuit
The Journal of Supercomputing
An Optimal Implementation of Broadcasting with Selective Reduction
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Load Balancing Requirements in Parallel Implementations of Image Feature Extraction Tasks
IEEE Transactions on Parallel and Distributed Systems
A Family of Parallel Prefix Algorithms Embedded in Networks
IEEE Transactions on Parallel and Distributed Systems
Concurrent Processing of Linearly Ordered Data Structures on Hypercube Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Unstructured Tree Search on SIMD Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
An Accumulative Parallel Skeleton for All
ESOP '02 Proceedings of the 11th European Symposium on Programming Languages and Systems
Optimal Segmented Scan and Simulation of Reconfigurable Architectures on Fixed Connection Networks
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
On Parallel Reconfigurable Architectures for Image Processing
ParNum '99 Proceedings of the 4th International ACPC Conference Including Special Tracks on Parallel Numerics and Parallel Computing in Image Processing, Video Processing, and Multimedia: Parallel Computation
Declarative definition of group indexed data structures and approximation of their domains
Proceedings of the 3rd ACM SIGPLAN international conference on Principles and practice of declarative programming
SAT: a programming methodology with skeletons and collective operations
Patterns and skeletons for parallel and distributed computing
Program transformations and skeletons: formal derivation of parallel programs
PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Z4: a new depth-size optimal parallel prefix circuit with small depth
Neural, Parallel & Scientific Computations
Parallel Computing - Special issue: High performance computing with geographical data
Parallelizing functional programs by generalization
Journal of Functional Programming
Parallelization of divide-and-conquer by translation to nested loops
Journal of Functional Programming
A new approach to constructing optimal parallel prefix circuits with small depth
Journal of Parallel and Distributed Computing
Efficient parallel solutions of linear algebraic circuits
Journal of Parallel and Distributed Computing
Time and work optimal simulation of basic reconfigurable meshes on hypercubes
Journal of Parallel and Distributed Computing
A new parallel skeleton for general accumulative computations
International Journal of Parallel Programming
Faster optimal parallel prefix circuits: New algorithmic construction
Journal of Parallel and Distributed Computing
A library of constructive skeletons for sequential style of parallel programming
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Parallel skeletons for manipulating general trees
Parallel Computing - Algorithmic skeletons
Automatic inversion generates divide-and-conquer parallel programs
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Scout: a data-parallel programming language for graphics processors
Parallel Computing
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Supporting tasks with adaptive groups in data parallel programming
International Journal of Computational Science and Engineering
Computation-efficient parallel prefix
AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Parallel prefix algorithms on the multicomputer
WSEAS Transactions on Computer Research
Fast problem-size-independent parallel prefix circuits
Journal of Parallel and Distributed Computing
Rigel: an architecture and scalable programming interface for a 1000-core accelerator
Proceedings of the 36th annual international symposium on Computer architecture
Fast minimum spanning tree for large graphs on the GPU
Proceedings of the Conference on High Performance Graphics 2009
New parallel prefix algorithms
AIC'09 Proceedings of the 9th WSEAS international conference on Applied informatics and communications
MapReduce System over Heterogeneous Mobile Devices
SEUS '09 Proceedings of the 7th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems
New families of computation-efficient parallel prefix algorithms
WSEAS Transactions on Computers
Simple optimizations for an applicative array language for graphics processors
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Xetal-II: A Low-Power Massively-Parallel Processor for Video Scene Analysis
Journal of Signal Processing Systems
A highly-parallel TSP solver for a GPU computing platform
NMA'10 Proceedings of the 7th international conference on Numerical methods and applications
GPU-efficient recursive filtering and summed-area tables
Proceedings of the 2011 SIGGRAPH Asia Conference
Static GPU threads and an improved scan algorithm
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Bringing back monad comprehensions
Proceedings of the 4th ACM symposium on Haskell
Scalable fast multipole methods on distributed heterogeneous architectures
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel prefix (scan) algorithms for MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Database-centric programming for wide-area sensor systems
DCOSS'05 Proceedings of the First IEEE international conference on Distributed Computing in Sensor Systems
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Automatic parallelization of recursive functions using quantifier elimination
FLOPS'10 Proceedings of the 10th international conference on Functional and Logic Programming
Design and implementation of 812: A declarative data-parallel language
Computer Languages
Scan detection and parallelization in "inherently sequential" nested loop programs
Proceedings of the Tenth International Symposium on Code Generation and Optimization
ManyLoDs: parallel many-view level-of-detail selection for real-time global illumination
EGSR'11 Proceedings of the Twenty-second Eurographics conference on Rendering
Journal of Parallel and Distributed Computing
Design of a low-energy data processing architecture for WSN nodes
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
A T2 graph-reduction approach to fusion
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
A sound and complete abstraction for reasoning about parallel prefix sums
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
yaSpMV: yet another SpMV framework on GPUs
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Development of an intelligent distributed news retrieval system
International Journal of Knowledge-based and Intelligent Engineering Systems
Hi-index | 14.98 |
A study of the effects of adding two scan primitives as unit-time primitives to PRAM (parallel random access machine) models is presented. It is shown that the primitives improve the asymptotic running time of many algorithms by an O(log n) factor, greatly simplifying the description of many algorithms, and are significantly easier to implement than memory references. It is argued that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. The author describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radix-sort algorithm, a quicksort algorithm, a minimum-spanning-tree algorithm, a line-drawing algorithm, and a merging algorithm. These all run on an EREW (exclusive read, exclusive write) PRAM with the addition of two scan primitives and are either simpler or more efficient than their pure PRAM counterparts. The scan primitives have been implemented in microcode on the Connection Machine system, are available in PARIS (the parallel instruction set of the machine).