Application of the nested rollout policy adaptation algorithm to the traveling salesman problem with time windows

Authors:
Tristan Cazenave;Fabien Teytaud
Affiliations:
LAMSADE, Université Paris Dauphine, France;LAMSADE, Université Paris Dauphine, France,HEC Paris, CNRS, Jouy-en-Josas, France
Venue:
LION'12 Proceedings of the 6th international conference on Learning and Intelligent Optimization
Year:
2012

Citing 9
Cited 1

Algorithms for the vehicle routing and scheduling problems with time window constraints

Operations Research
A Generalized Insertion Heuristic for the Traveling Salesman Problem with Time Windows

Operations Research
An Exact Constraint Logic Programming Algorithm for the Traveling Salesman Problem with Time Windows

Transportation Science
A Hybrid Exact Algorithm for the TSPTW

INFORMS Journal on Computing
Nested Monte-Carlo search

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Beam-ACO for the travelling salesman problem with time windows

Computers and Operations Research
Biasing Monte-Carlo simulations through RAVE values

CG'10 Proceedings of the 7th international conference on Computers and games
Optimization of the nested Monte-Carlo algorithm on the traveling salesman problem with time windows

EvoApplications'11 Proceedings of the 2011 international conference on Applications of evolutionary computation - Volume Part II
Nested rollout policy adaptation for Monte Carlo tree search

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume One

Investigating monte-carlo methods on the weak schur problem

EvoCOP'13 Proceedings of the 13th European conference on Evolutionary Computation in Combinatorial Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we are interested in the minimization of the travel cost of the traveling salesman problem with time windows. In order to do this minimization we use a Nested Rollout Policy Adaptation (NRPA) algorithm. NRPA has multiple levels and maintains the best tour at each level. It consists in learning a rollout policy at each level. We also show how to improve the original algorithm with a modified rollout policy that helps NRPA to avoid time windows violations.