Optimizing time warp simulation with reinforcement learning techniques

  • Authors:
  • Jun Wang;Carl Tropper

  • Affiliations:
  • McGill University, Montreal, Canada;McGill University, Montreal, Canada

  • Venue:
  • Proceedings of the 39th conference on Winter simulation: 40 years! The best is yet to come
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Adaptive Time Warp protocols in the literature are usually based on a pre-defined analytic model of the system, expressed as a closed form function that maps system state to control parameter. The underlying assumption is that this model itself is optimal. In this paper we present a new approach that utilizes Reinforcement Learning techniques, also known as simulation-based dynamic programming. Instead of assuming an optimal control strategy, the very goal of Reinforcement Learning is to find the optimal strategy through simulation. A value function that captures the history of system feedbacks is used, and no prior knowledge of the system is required. Our reinforcement learning techniques were implemented in a distributed VLSI simulator with the objective of finding the optimal size of a bounded time window. Our experiments using two benchmark circuits indicated that it was successful in doing so.