Multi-agent Q-learning of channel selection in multi-user cognitive radio systems: a two by two case

Authors:
Husheng Li
Affiliations:
Department of Electrical Engineering and Computer Science, the University of Tennessee, Knoxville, TN
Venue:
SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Year:
2009

Citing 5
Cited 3

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multi-agent radio resource allocation

Mobile Networks and Applications
Dynamics of Multiple-Seller and Multiple-Buyer Spectrum Trading in Cognitive Radio Networks: A Game-Theoretic Modeling Approach

IEEE Transactions on Mobile Computing
Dynamic Spectrum Access and Management in Cognitive Radio Networks

Dynamic Spectrum Access and Management in Cognitive Radio Networks
A Comprehensive Survey of Multiagent Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Opportunistic spectrum access with multiple users: learning under competition

INFOCOM'10 Proceedings of the 29th conference on Information communications
COMAS: a cooperative multiagent architecture for spectrum sharing

EURASIP Journal on Wireless Communications and Networking
A reinforcement learning based solution for cognitive network cooperation between co-located, heterogeneous wireless sensor networks

Ad Hoc Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Resource allocation is an important issue in cognitive radio systems. It can be done by carrying out negotiation among secondary users. However, significant overhead may be incurred by the negotiation since the negotiation needs to be done frequently due to the rapid change of primary users' activity. In this paper, a channel selection scheme without negotiation is considered for multi-user and multi-channel cognitive radio systems. To avoid collision incurred by non-coordination, each secondary user learns how to select channels according to its experience. Multi-agent reinforcement leaning (MARL) is applied in the framework of Q-learning by considering opponent secondary users as a part of the environment. The dynamics of the Q-learning are illustrated using Metrick-Polak plot. A rigorous proof of the convergence of Q-learning is provided via the similarity between the Q-learning and Robinson-Monro algorithm, as well as the analysis of convergence of the corresponding ordinary differential equation (via Lyapunov function). Examples are illustrated and the performance of learning is evaluated by numerical simulations.