Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Introduction to Stochastic Search and Optimization
Introduction to Stochastic Search and Optimization
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Learning to act using real-time dynamic programming
Artificial Intelligence
Optimality of threshold policies for transmission scheduling in correlated fading channels
IEEE Transactions on Communications
IEEE Transactions on Signal Processing
Game Theoretic Cross-Layer Transmission Policies in Multipacket Reception Wireless Networks
IEEE Transactions on Signal Processing
IEEE Transactions on Signal Processing
IEEE Transactions on Signal Processing
IEEE Transactions on Wireless Communications
Communication over fading channels with delay constraints
IEEE Transactions on Information Theory
Exploiting decentralized channel state information for random access
IEEE Transactions on Information Theory
Optimality of threshold policies for transmission scheduling in correlated fading channels
IEEE Transactions on Communications
IEEE Transactions on Communications
Hi-index | 35.68 |
We consider transmission scheduling using an ARQ protocol with retransmissions given channel state information (CSI) and a correlated fading channel. The problem is formulated as a countable state, infinite horizon, average cost Markov decision process (MDP) with an average delay constraint.Our main result is to give sufficient conditions on the channel memory, and transmission cost so that the optimal transmission scheduling policy is a monotonically increasing function of the buffer occupancy. In proving this result, we first prove positive recurrence (stability) of the buffer. The monotone structure proof consists of two steps. First, the constrained MDP (CMDP) is transformed into an unconstrained MDP using a Lagrangian dynamic programming formulation. It is proved that the unconstrained optimal policy is pure and monotonically increasing in the buffer occupancy. It is then shown that the constrained optimal policy is a randomized mixture of two pure transmission policies that are monotone in the buffer state. Finally, the monotone structure of the optimal transmission policy is exploited to derive a monotone-policy -learning algorithm and a stochastic approximation based monotone policy search algorithm for estimating the optimal policy in real time.