Avoiding unintended AI behaviors

Authors:
Bill Hibbard
Affiliations:
SSEC, University of Wisconsin, Madison, WI
Venue:
AGI'12 Proceedings of the 5th international conference on Artificial General Intelligence
Year:
2012

Citing 11
Cited 1

An introduction to Kolmogorov complexity and its applications (2nd ed.)

An introduction to Kolmogorov complexity and its applications (2nd ed.)
Super-intelligent machines

ACM SIGGRAPH Computer Graphics
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability

Universal Artificial Intelligence: Sequential Decisions Based On Algorithmic Probability
The Basic AI Drives

Proceedings of the 2008 conference on Artificial General Intelligence 2008: Proceedings of the First AGI Conference
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Self-modification and mortality in artificial agents

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Delusion, survival, and intelligent agents

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Rational universal benevolence: simpler, safer, and Wiser than "friendly AI"

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Learning what to value

AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
The Superintelligent Will: Motivation and Instrumental Rationality in Advanced Artificial Agents

Minds and Machines

Decision support for safe AI design

AGI'12 Proceedings of the 5th international conference on Artificial General Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Artificial intelligence (AI) systems too complex for predefined environment models and actions will need to learn environment models and to choose actions that optimize some criteria. Several authors have described mechanisms by which such complex systems may behave in ways not intended in their designs. This paper describes ways to avoid such unintended behavior. For hypothesized powerful AI systems that may pose a threat to humans, this paper proposes a two-stage agent architecture that avoids some known types of unintended behavior. For the first stage of the architecture this paper shows that the most probable finite stochastic program to model a finite history is finitely computable, and that there is an agent that makes such a computation without any unintended instrumental actions.