-Learning Algorithms for Constrained Markov Decision Processes With Randomized Monotone Policies: Application to MIMO Transmission Control

Authors:
D.V. Djonin;V. Krishnamurthy
Affiliations:
Dyaptive, Inc, Vancouver, BC;-
Venue:
IEEE Transactions on Signal Processing
Year:
2007

Citing 0
Cited 9

Conjectural equilibrium in multiuser power control games

IEEE Transactions on Signal Processing
Cross-layer design of FDD-OFDM systems based on ACK/NAK feedbacks

IEEE Transactions on Information Theory
Joint admission control and antenna assignment for multiclass QoS in spatial multiplexing MIMO wireless networks

IEEE Transactions on Wireless Communications
Monotonicity of constrained optimal transmission policies in correlated fading channels with ARQ

IEEE Transactions on Signal Processing
On-line learning and optimization for wireless video transmission

IEEE Transactions on Signal Processing
A dynamical games approach to transmission-rate adaptation in multimedia WLAN

IEEE Transactions on Signal Processing
Distributive stochastic learning for delay-optimal OFDMA power and subband allocation

IEEE Transactions on Signal Processing
Delay-optimal user scheduling and inter-cell interference management in cellular network via distributive stochastic learning

IEEE Transactions on Wireless Communications
A constrained MDP-based vertical handoff decision algorithm for 4G heterogeneous wireless networks

Wireless Networks

Quantified Score

Hi-index	35.76

Visualization

Abstract

This paper presents novel Q-learning based stochastic control algorithms for rate and power control in V-BLAST transmission systems. The algorithms exploit the supermodularity and monotonic structure results derived in the companion paper. Rate and power control problem is posed as a stochastic optimization problem with the goal of minimizing the average transmission power under the constraint on the average delay that can be interpreted as the quality of service requirement of a given application. Standard Q-learning algorithm is modified to handle the constraints so that it can adaptively learn structured optimal policy for unknown channel/traffic statistics. We discuss the convergence of the proposed algorithms and explore their properties in simulations. To address the issue of unknown transmission costs in an unknown time-varying environment, we propose the variant of Q-learning algorithm in which power costs are estimated in online fashion, and we show that this algorithm converges to the optimal solution as long as the power cost estimates are asymptotically unbiased