Observer effect from stateful resources in agent sensing

Authors:
Adam Eck;Leen-Kiat Soh
Affiliations:
Department of Computer Science and Engineering, University of Nebraska, Lincoln, USA 68588-0115;Department of Computer Science and Engineering, University of Nebraska, Lincoln, USA 68588-0115
Venue:
Autonomous Agents and Multi-Agent Systems
Year:
2013

Citing 36
Cited 2

Learning internal representations by error propagation

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Long short-term memory

Neural Computation
A Value-Driven System for Autonomous Information Gathering

Journal of Intelligent Information Systems
BIG: an agent for resource-bounded information gathering and decision making

Artificial Intelligence - Special issue on Intelligent internet systems
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A logic for uncertain probabilities

International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems
The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
Electric Elves: Applying Agent Technology to Support Human Organizations

Proceedings of the Thirteenth Conference on Innovative Applications of Artificial Intelligence Conference
A POMDP formulation of preference elicitation problems

Eighteenth national conference on Artificial intelligence
The Case for a Hybrid Passive/Active Network Monitoring Scheme in the Wireless Internet

ICON '00 Proceedings of the 8th IEEE International Conference on Networks
Decision-theoretic active sensing for autonomous agents

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
If not now, when?: the effects of interruption at different moments within task execution

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

IEEE Transactions on Knowledge and Data Engineering
Learning to Forget: Continual Prediction with LSTM

Neural Computation
Experiences creating three implementations of the repast agent modeling toolkit

ACM Transactions on Modeling and Computer Simulation (TOMACS)
A utility-based sensing and communication model for a glacial sensor network

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
A framework for meta-level control in multi-agent systems

Autonomous Agents and Multi-Agent Systems
The cost of interrupted work: more speed and stress

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
The permutable POMDP: fast solutions to POMDPs for preference elicitation

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Like an intuitive and courteous butler: a proactive personal agent for task management

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Optimizing Sensing: From Water to the Web

Computer
Near-optimal observation selection using submodular functions

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Optimal nonmyopic value of information in graphical models: efficient algorithms and theoretical limits

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Solving deep memory POMDPs with recurrent policy gradients

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Introducing the tileworld: experimentally evaluating agent architectures

AAAI'90 Proceedings of the eighth National conference on Artificial intelligence - Volume 1
Lessons Learned from Comprehensive Deployments of Multiagent CSCL Applications I-MINDS and ClassroomWiki

IEEE Transactions on Learning Technologies
Human-robot interactions during the robot-assisted urban search and rescue response at the World Trade Center

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Local area network characteristics, with implications for broadband network congestion management

IEEE Journal on Selected Areas in Communications

Active sensing in complex multiagent environments

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
MineralMiner: An active sensing simulation environment

Multiagent and Grid Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many real-world applications of multi-agent systems, agent reasoning suffers from bounded rationality caused by both limited resources and limited knowledge. When agent sensing to overcome its knowledge limitations also requires resource use, the agent's knowledge refinement is affected due to its inability to always sense when and as accurately as needed, further leading to poor decision making. In this paper, we consider what happens when sensing actions require the use of stateful resources, which we define as resources whose state-dependent behavior changes over time based on usage. Current literature addressing agent sensing with limited resources primarily investigates stateless resources, such as avoiding the use of too much time or energy during sensing. However, sensing itself can change the state of a resource, and thus its behavior, which affects both the information gathered and the resulting knowledge refinement. This produces a phenomenon where the sensing action can and will distort its own outcome (and potentially future outcomes), termed the Observer Effect (OE) after the similar phenomenon in the physical sciences. Under this effect, when deliberating about when and how to perform sensing that requires use of stateful resources, an agent faces a strategic tradeoff between satisfying the need for (1) knowledge refinement to support its reasoning, and (2) avoiding knowledge corruption due to distorted sensing outcomes. To address this tradeoff, we model sensing action selection as a partially observable Markov decision process where an agent optimizes knowledge refinement while considering the (possibly hidden) state of the resources used during sensing. In this model, the agent uses reinforcement learning to learn a controller for action selection, as well as how to predict expected knowledge refinement based on resource use during sensing. Our approach is unique from other bounded rationality and sensing research as we consider how to make decisions about sensing with stateful resources that produce side effects such as the OE, as opposed to simply using stateless resources with no such side effect. We evaluate our approach in a fully and partially observable agent mining simulation. The results demonstrate that considering resource state and the OE during sensing action selection through our approach (1) yielded better knowledge refinement, (2) appropriately balanced current and future refinement to avoid knowledge corruption, and (3) exploited the relationship (i.e., high, positive correlation) between sensing and task performance to boost task performance through improved sensing. Further, our methodology also achieved good knowledge refinement even when the OE is not present, indicating that it can improve sensing performance in a wide variety of environments. Finally, our results also provide insights into the types and configurations of learning algorithms useful for learning within our methodology.