Technical Note: \cal Q-Learning
Machine Learning
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Hi-index | 0.00 |
In this paper, we focus on a new transport service called on-demand bus system. A major feature of the system is that buses pick up customers door-to-door when needed or required. Thus, there is no pre-determined travel routes for buses, and travel routes must be changed according to the occurrence frequency of customers. In order to find a more effective travel plan to the problem, we adopt Q-learning which is one of the machine learning algorithms. However, native Q-learning is inadequate to our target problem because the number of customers at pick-up points is time-dependent. Therefore, we improve an update process of Q values and a selection process of the next pick-up point, on the basis of time passage parameters. In particular, rewards are understated in update process, on the other hand, Q values are overstated in selection process. At the last, we report our simulation results and show the effectiveness of our algorithm for the problem.