Asynchronous Stochastic Approximation and Q-Learning
Machine Learning
Cognitive radio for flexible mobile multimedia communications
Mobile Networks and Applications - Special issue on Mobile Multimedia Communications (MOMUC '99)
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Multi-agent radio resource allocation
Mobile Networks and Applications
IEEE Transactions on Mobile Computing
Towards utility-optimal random access without message passing
Wireless Communications & Mobile Computing - Recent Advances in Wireless Communications and Networks
MAC scheduling with low overheads by learning neighborhood contention patterns
IEEE/ACM Transactions on Networking (TON)
A Comprehensive Survey of Multiagent Reinforcement Learning
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Cyclostationary Signatures in Practical Cognitive Radio Applications
IEEE Journal on Selected Areas in Communications
Hi-index | 0.00 |
An Aloha-like spectrum access scheme without negotiation is considered for multiuser and multichannel cognitive radio systems. To avoid collisions incurred by the lack of coordination, each secondary user learns how to select channels according to its experience. Multiagent reinforcement leaning (MARL) is applied for the secondary users to learn good strategies of channel selection. Specifically, the framework of Q-learning is extended from single user case to multiagent case by considering other secondary users as a part of the environment. The dynamics of the Q-learning are illustrated using a Metrick-Polak plot, which shows the traces of Q-values in the two-user case. For both complete and partial observation cases, rigorous proofs of the convergence of multiagent Q-learning without communications, under certain conditions, are provided using the Robins-Monro algorithm and contraction mapping, respectively. The learning performance (speed and gain in utility) is evaluated by numerical simulations.