The complexity of Markov decision processes
Mathematics of Operations Research
A model for reasoning about persistence and causation
Computational Intelligence
An Upper Bound on the Loss from Approximate Optimal-Value Functions
Machine Learning
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Algebraic decision diagrams and their applications
ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Bounded-parameter Markov decision process
Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Variable Resolution Discretization for High-Accuracy Solutions of Optimal Control Problems
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Computing Factored Value Functions for Policies in Structured MDPs
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Policy Iteration for Factored MDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Stable Function Approximation in Dynamic Programming
Stable Function Approximation in Dynamic Programming
The computational complexity of probabilistic planning
Journal of Artificial Intelligence Research
Exploiting structure in policy construction
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
The BATmobile: towards a Bayesian automated taxi
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Model minimization in Markov decision processes
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
SPUDD: stochastic planning using decision diagrams
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Context-specific independence in Bayesian networks
UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
An Inductive Logic Programming Approach to Statistical Relational Learning
Proceedings of the 2005 conference on An Inductive Logic Programming Approach to Statistical Relational Learning
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Effective control knowledge transfer through learning skill and representation hierarchies
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Hi-index | 0.00 |
We present an algorithm for aggregating states in solving large MDPs (represented as factored MDPs) using search by successive refinement in the space of non-homogeneous partitions. Homogeneity is defined in terms of stochastic bisimulation and reward equivalence within blocks of a partition. Since homogeneous partitions that define equivalent reduced-state-space MDPs can have a large number of blocks, we relax the requirement of homogeneity. The algorithm constructs approximate aggregate MDPs from non-homogeneous partitions, solves the aggregate MDPs exactly, and then uses the resulting value functions as part of a heuristic in refining the current best nonhomogeneous partition. We outline the theory motivating the use of this heuristic and present empirical results. In addition to investigating more exhaustive local search methods we explore the use of techniques derived from research on discretizing continuous state spaces. Finally, we compare the results from our algorithms which search in the space of non-homogeneous partitions with exact and approximate algorithms which represent homogeneous and approximately homogeneous partitions as decision trees or algebraic decision diagrams.