Graphs and algorithms
Algorithms for games
Algorithms in C
Data networks (2nd ed.)
Elements of information theory
Elements of information theory
Fundamentals of speech recognition
Fundamentals of speech recognition
Randomized algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Statistical methods for speech recognition
Statistical methods for speech recognition
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions
Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Semiring frameworks and algorithms for shortest-distance problems
Journal of Automata, Languages and Combinatorics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A new paradigm for ranking pages on the world wide web
WWW '03 Proceedings of the 12th international conference on World Wide Web
Ant Colony Optimization
Automated Planning: Theory & Practice
Automated Planning: Theory & Practice
Fastest Mixing Markov Chain on a Graph
SIAM Review
Gaussian Markov Random Fields: Theory And Applications (Monographs on Statistics and Applied Probability)
Graph theory: An algorithmic approach (Computer science and applied mathematics)
Graph theory: An algorithmic approach (Computer science and applied mathematics)
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
Direct Methods for Sparse Linear Systems (Fundamentals of Algorithms 2)
Planning Algorithms
IEEE Transactions on Knowledge and Data Engineering
Clustering and Embedding Using Commute Times
IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics)
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Graphs, Networks and Algorithms
Graphs, Networks and Algorithms
Optimal tuning of continual online exploration in reinforcement learning
ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part I
Interpolating between random walks and shortest paths: a path functional approach
SocInfo'12 Proceedings of the 4th international conference on Social Informatics
Hi-index | 0.00 |
This letter addresses the problem of designing the transition probabilities of a finite Markov chain (the policy) in order to minimize the expected cost for reaching a destination node from a source node while maintaining a fixed level of entropy spread throughout the network (the exploration). It is motivated by the following scenario. Suppose you have to route agents through a network in some optimal way, for instance, by minimizing the total travel cost---nothing particular up to now---you could use a standard shortest-path algorithm. Suppose, however, that you want to avoid pure deterministic routing policies in order, for instance, to allow some continual exploration of the network, avoid congestion, or avoid complete predictability of your routing strategy. In other words, you want to introduce some randomness or unpredictability in the routing policy (i.e., the routing policy is randomized). This problem, which will be called the randomized shortest-path problem (RSP), is investigated in this work. The global level of randomness of the routing policy is quantified by the expected Shannon entropy spread throughout the network and is provided a priori by the designer. Then, necessary conditions to compute the optimal randomized policy---minimizing the expected routing cost---are derived. Iterating these necessary conditions, reminiscent of Bellman's value iteration equations, allows computing an optimal policy, that is, a set of transition probabilities in each node. Interestingly and surprisingly enough, this first model, while formulated in a totally different framework, is equivalent to Akamatsu's model (1996), appearing in transportation science, for a special choice of the entropy constraint. We therefore revisit Akamatsu's model by recasting it into a sum-over-paths statistical physics formalism allowing easy derivation of all the quantities of interest in an elegant, unified way. For instance, it is shown that the unique optimal policy can be obtained by solving a simple linear system of equations. This second model is therefore more convincing because of its computational efficiency and soundness. Finally, simulation results obtained on simple, illustrative examples show that the models behave as expected.