Multiagent Q-learning for aloha-like spectrum access in cognitive radio systems

Authors:
Husheng Li
Affiliations:
Department of Electrical Engineering and Computer Science, The University of Tennessee, Knoxville, TN
Venue:
EURASIP Journal on Wireless Communications and Networking
Year:
2010

Citing 10
Cited 0

Asynchronous Stochastic Approximation and Q-Learning

Machine Learning
Cognitive radio for flexible mobile multimedia communications

Mobile Networks and Applications - Special issue on Mobile Multimedia Communications (MOMUC '99)
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Multi-agent radio resource allocation

Mobile Networks and Applications
Dynamics of Multiple-Seller and Multiple-Buyer Spectrum Trading in Cognitive Radio Networks: A Game-Theoretic Modeling Approach

IEEE Transactions on Mobile Computing
Towards utility-optimal random access without message passing

Wireless Communications & Mobile Computing - Recent Advances in Wireless Communications and Networks
MAC scheduling with low overheads by learning neighborhood contention patterns

IEEE/ACM Transactions on Networking (TON)
A Comprehensive Survey of Multiagent Reinforcement Learning

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Cyclostationary Signatures in Practical Cognitive Radio Applications

IEEE Journal on Selected Areas in Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

An Aloha-like spectrum access scheme without negotiation is considered for multiuser and multichannel cognitive radio systems. To avoid collisions incurred by the lack of coordination, each secondary user learns how to select channels according to its experience. Multiagent reinforcement leaning (MARL) is applied for the secondary users to learn good strategies of channel selection. Specifically, the framework of Q-learning is extended from single user case to multiagent case by considering other secondary users as a part of the environment. The dynamics of the Q-learning are illustrated using a Metrick-Polak plot, which shows the traces of Q-values in the two-user case. For both complete and partial observation cases, rigorous proofs of the convergence of multiagent Q-learning without communications, under certain conditions, are provided using the Robins-Monro algorithm and contraction mapping, respectively. The learning performance (speed and gain in utility) is evaluated by numerical simulations.