Towards an ontology of approximate reason
Fundamenta Informaticae
Eligibility Traces for Off-Policy Policy Evaluation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Reinforcement Learning in Swarms that Learn
IAT '05 Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology
Monte Carlo Off-Policy Reinforcement Learning: A Rough Set Approach
HIS '05 Proceedings of the Fifth International Conference on Hybrid Intelligent Systems
Reinforcement Learning with Approximation Spaces
Fundamenta Informaticae
Rough sets and information granulation
IFSA'03 Proceedings of the 10th international fuzzy systems association World Congress conference on Fuzzy sets and systems
Nearness of Objects: Extension of Approximation Space Model
Fundamenta Informaticae - Special Issue on Concurrency Specification and Programming (CS&P)
Biologically-inspired adaptive learning control strategies: A rough set approach
International Journal of Hybrid Intelligent Systems
Toward Approximate Adaptive Learning
RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Control approach to rough set reduction
Computers & Mathematics with Applications
Near sets: toward approximation space-based object recognition
RSKT'07 Proceedings of the 2nd international conference on Rough sets and knowledge technology
A new Q-learning with generalized approximation spaces
ICNC'09 Proceedings of the 5th international conference on Natural computation
Nearness approximation space based on axiomatic fuzzy sets
International Journal of Approximate Reasoning
Nearness of Objects: Extension of Approximation Space Model
Fundamenta Informaticae - Special Issue on Concurrency Specification and Programming (CS&P)
Development of Near Sets Within the Framework of Axiomatic Fuzzy Sets
Fundamenta Informaticae
Hi-index | 0.00 |
This paper introduces an approach to off-policy Monte Carlo (MC) learning guided by behaviour patterns gleaned from approximation spaces and rough set theory introduced by Zdzislaw Pawlak in 1981. During reinforcement learning, an agent makes action selections in an effort to maximize a reward signal obtained from the environment. The problem considered in this paper is how to estimate the expected value of cumulative future discounted rewards in evaluating agent actions during reinforcement learning. The solution to this problem results from a form of weighted sampling using a combination of MC methods and approximation spaces to estimate the expected value of returns on actions. This is made possible by considering behaviour patterns of an agent in the context of approximation spaces. The framework provided by an approximation space makes it possible to measure the degree that agent behaviours are a part of (''covered by'') a set of accepted agent behaviours that serve as a behaviour evaluation norm. Furthermore, this article introduces an adaptive action control strategy called run-and-twiddle (RT) (a form of adaptive learning introduced by Oliver Selfridge in 1984), where approximate spaces are constructed on a ''need by need'' basis. Finally, a monocular vision system has been selected to facilitate the evaluation of the reinforcement learning methods. The goal of the vision system is to track a moving object, and rewards are based on the proximity of the object to the centre of the camera field of view. The contribution of this article is the introduction of a RT form of off-policy MC learning.