Adaptive sampling based large-scale stochastic resource control

Authors:
Balázs Csanád Csáji;László Monostori
Affiliations:
Computer and Automation Research Institute, Hungarian Academy of Sciences, Budapest, Hungary;Computer and Automation Research Institute, Hungarian Academy of Sciences and Faculty of Mechanical Engineering, Budapest University of Technology and Economics
Venue:
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Year:
2006

Citing 6
Cited 2

Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Value Function Based Production Scheduling

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
New Support Vector Algorithms

Neural Computation
Scheduling: Theory, Algorithms, and Systems

Scheduling: Theory, Algorithms, and Systems
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Improving iterative repair strategies for scheduling with the SVM

Neurocomputing

Value Function Based Reinforcement Learning in Changing Markovian Environments

The Journal of Machine Learning Research
Adaptive stochastic resource control: a machine learning approach

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider closed-loop solutions to stochastic optimization problems of resource allocation type. They concern with the dynamic allocation of reusable resources over time to non-preemtive interconnected tasks with stochastic durations. The aim is to minimize the expected value of a regular performance measure. First, we formulate the problem as a stochastic shortest path problem and argue that our formulation has favorable properties, e.g., it has finite horizon, it is acyclic, thus, all policies are proper, and moreover, the space of control policies can be safely restricted. Then, we propose an iterative solution. Essentially, we apply a reinforcement learning based adaptive sampler to compute a sub-optimal control policy. We suggest several approaches to enhance this solution and make it applicable to large-scale problems. The main improvements are: (1) the value function is maintained by feature-based support vector regression; (2) the initial exploration is guided by rollout algorithms; (3) the state space is partitioned by clustering the tasks while keeping the precedence constraints satisfied; (4) the action space is decomposed and, consequently, the number of available actions in a state is decreased; and, finally, (5) we argue that the sampling can be effectively distributed among several processors. The effectiveness of the approach is demonstrated by experimental results on both artificial (benchmark) and real-world (industry related) data.