Priority Queues and Sorting Methods for Parallel Simulation

Authors:
Miltos D. Grammatikakis;Stefan Liesche
Affiliations:
-;-
Venue:
IEEE Transactions on Software Engineering
Year:
2000

Citing 22
Cited 4

Self adjusting heaps

SIAM Journal on Computing
Relaxed heaps: an alternative to Fibonacci heaps with applications to parallel computation

Communications of the ACM
Concurrent Access of Priority Queues

IEEE Transactions on Computers
Concurrent operations on priority queues

Communications of the ACM
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Distributed data structures: a complexity-oriented view

Proceedings of the 4th international workshop on Distributed algorithms
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An analysis of diffusive load-balancing

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Counting networks

Journal of the ACM (JACM)
Programming DEC-Alpha based multiprocessors the easy way (extended abstract)

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Scalable concurrent counting

ACM Transactions on Computer Systems (TOCS)
On-line Algorithms for Path Selectionin a Nonblocking Network

SIAM Journal on Computing
Parallel algorithms for personalized communication and sorting with an experimental study (extended abstract)

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
An efficient algorithm for concurrent priority queue heaps

Information Processing Letters
Eraser: a dynamic data race detector for multi-threaded programs

Proceedings of the sixteenth ACM symposium on Operating systems principles
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
A parallel priority queue with constant time operations

Journal of Parallel and Distributed Computing - Parallel and distributed data structures
Randomized priority queues for fast parallel access

Journal of Parallel and Distributed Computing - Parallel and distributed data structures
Packet routing in fixed-connection networks: a survey

Journal of Parallel and Distributed Computing
A Comparison of Communication Costs for Three Parallel Programming Paradigms of Hypercube and Mesh Architectures

Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing
Reverse profiling

Proceedings of the First IFIP TC10 International Workshop on Software Engineering for Parallel and Distributed Systems
Fast Priority Queues for Parallel Branch-and-Bound

IRREGULAR '95 Proceedings of the Second International Workshop on Parallel Algorithms for Irregularly Structured Problems

Software for multiprocessor networks on chip

Networks on chip
Fast and lock-free concurrent priority queues for multi-thread systems

Journal of Parallel and Distributed Computing
40Gbps de-layered silicon protocol engine for TCP record

Proceedings of the conference on Design, automation and test in Europe: Proceedings
A GPU-Based Application Framework Supporting Fast Discrete-Event Simulation

Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We examine the design, implementation, and experimental analysis of parallel priority queues for device and network simulation. We consider: 1) distributed splay trees using MPI, 2) concurrent heaps using shared memory atomic locks, and 3) a new, more general concurrent data structure based on distributed sorted lists, which is designed to provide dynamically balanced work allocation (with automatic or manual control) and efficient use of shared memory resources. We evaluate performance for all three data structures on a Cray-T3E900 system at KFA-Jülich. Our comparisons are based on simulations of single buffers and a $64 \times 64$ packet switch which supports multicasting. In all implementations, PEs monitor traffic at their preassigned input/output ports, while priority queue elements are distributed across the Cray-T3E virtual shared memory. Our experiments with up to 60,000 packets and two to 64 PEs indicate that concurrent priority queues perform much better than distributed ones. Both concurrent implementations have comparable performance, while our new data structure uses less memory and has been further optimized. We also consider parallel simulation for symmetric networks by sorting integer conflict functions and implementing an interesting packet indexing scheme. The optimized message passing network simulator can process $\sim 500$K packet moves in one second, with an efficiency that exceeds $\sim 50$ percent for a few thousands packets on the Cray-T3E with 32 PEs. All developed data structures now form a parallel library. Although our concurrent implementations use the Cray-T3E ShMem library, portability can be derived from Open-MP or MPI-2 standard libraries, which will provide support for one-way communication and shared memory lock mechanisms.