Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
Adaptive Markov Control Processes
Adaptive Markov Control Processes
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
Introduction to Stochastic Dynamic Programming: Probability and Mathematical
How Useful Is Old Information?
IEEE Transactions on Parallel and Distributed Systems
Delayed information and action in on-line algorithms
Information and Computation
On the optimal control of arrivals to a single queue with arbitrary feedback delay
Queueing Systems: Theory and Applications
Admission Control With Incomplete Information To A Finite Buffer Queue
Probability in the Engineering and Informational Sciences
Planning and Learning in Environments with Delayed Feedback
ECML '07 Proceedings of the 18th European conference on Machine Learning
Monotonicity in Markov Reward and Decision Chains: Theory and Applications
Foundations and Trends® in Stochastic Systems
Learning and planning in environments with delayed feedback
Autonomous Agents and Multi-Agent Systems
Proceedings of the 3rd International Conference on Performance Evaluation Methodologies and Tools
Hi-index | 0.00 |
The theory of Markov Control Model with Perfect State Information (MCM-PSI) requires that the current state of the system is known to the decision maker at decision instants. Otherwise, one speaks of Markov Control Model with Imperfect State Information (MCM-ISI). In this article, we introduce a new class of MCM-ISI, where the information on the state of the system is delayed. Such an information structure is encountered, for instance, in high-speed data networks.In the first part of this article, we show that by enlarging the state space so as to include the last known state as well as all the decisions made during the travel time of the information, we may reduce a MCM-ISI to a MCM-PSI. In the second part of this paper, this result is applied to a flow control problem. Considered is a discrete time queueing model with Bernoulli arrivals and geometric services, where the intensity of the arrival stream is controlled. At the beginning of slot t+1, t=0,1,2,…, the decision maker has to select the probability of having one arrival in the current time slot from the set {p1, p2}, 0 ≤ p2 p1 ≤ 1, only on the basis of the queue-length and action histories in [0, t]. The aim is to optimize a discounted throughput/delay criterion. We show that there exists an optimal policy of a threshold type, where the threshold is seen to depend on the last action.