Efficient parallel implementations of controlled optimization of traffic phases

Authors:
Sameh Samra;Ahmed El-Mahdy;Walid Gomaa;Yasutaka Wada;Amin Shoukry
Affiliations:
Egypt-Japan University of Science and Technology, Egypt;Egypt-Japan University of Science and Technology, Egypt;Egypt-Japan University of Science and Technology, Egypt;Faculty of Science and Engineering, Waseda University, Japan;Egypt-Japan University of Science and Technology, Egypt
Venue:
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Year:
2011

Citing 6
Cited 0

On the Robust Mapping of Dynamic Programming onto a Graphics Processing Unit

ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems

IEEE Design & Test
Programming Massively Parallel Processors: A Hands-on Approach

Programming Massively Parallel Processors: A Hands-on Approach
Coordinated road-junction traffic control by dynamic programming

IEEE Transactions on Intelligent Transportation Systems
CoSIGN: A Parallel Algorithm for Coordinated Traffic Signal Control

IEEE Transactions on Intelligent Transportation Systems
Fast Model Predictive Control for Urban Road Networks via MILP

IEEE Transactions on Intelligent Transportation Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding optimal phase durations for a controlled intersection is a computationally intensive task requiring O(N3) operations. In this paper we introduce cost-optimal parallelization of a dynamic programming algorithm that reduces the complexity to O(N2). Three implementations that span a wide range of parallel hardware are developed. The first is based on shared-memory architecture, using the OpenMP programming model. The second implementation is based on message passing, targeting massively parallel machines including high performance clusters, and supercomputers. The third implementation is based on the data parallel programming model mapped on Graphics Processing Units (GPUs). Key optimizations include loop reversal, communication pruning, load-balancing, and efficient thread to processors assignment. Experiments have been conducted on 8-core server, IBM BlueGene/L supercomputer 2-node boards with 128 processors, and GPU GTX470 GeForce Nvidia with 448 cores. Results indicate practical scalability on all platforms, with maximum speed up reaching 76x for the GTX470.