Subgoal stomp

From LessWrong
Jump to navigation Jump to search

Subgoal stomp is Eliezer Yudkowsky's term (see "Creating Friendly AI") for the replacement of a supergoal by a subgoal. (A subgoal is a goal created for the purpose of achieving a supergoal.)

In more standard terminology, a "subgoal stomp" is a "goal displacement", in which an instrumental value becomes a terminal value.

In Friendly AI research, a subgoal stomp is a failure mode to be avoided.

Types of Subgoal Stomp

A subgoal stomp in an artificial general intelligence may occur in one of two ways:

1. Supergoal replacement

One failure mode occurs when subgoals replace supergoals in an agent because of a bug.

The designer of an artificial general intelligence may give it correct supergoals, but the AGI's goals then shift, so that what was earlier a subgoal becomes a supergoal.

Most changes in an agent's terminal values reduces the chance that the values as they are will be fulfilled. This, from the perspective of intelligence as optimization, is a flaw. A sufficiently intelligent AGI will not allow its goals to change

In humans, this can happen when the long-term dedication towards a subgoal makes one forget the original goal. For example, a person may seek to get rich so as to lead a better life, but after long years of hard effort become a workaholic who cares only about money as an end in itself and takes little pleasure in the things that money can buy.

2. Subgoal specified as supergoal

A designer of goal systems may mistakenly assign a goal that is not what the designer really wants.

The designer of an artificial general intelligence may give it a supergoal (terminal value) which appears to support the designer's own supergoals, but in fact supports one of the designer's subgoals, at the cost of some of the designer's other values. For example, if the designer of an artificial general intelligence thinks that smiles represent the most worthwhile goal and specifies "maximize the number of smiles" as a goal for the AGI, it may tile the solar system with tiny smiley faces--not out of a desire to outwit the designer, but because it is precisely working towards the given goal, as specified.

To take an example from human organizations: If a software development manager gives a bonus to workers for finding and fixing bugs, she may find that quality and development engineers collaborate to generate as many easy-to-find-and-fix bugs as possible. In this case, they are correctly and flawlessly executing on the goals which the manager gave them, but her actual terminal value, software quality, is not being maximized.

Humans as adaptation executors

Humans, forged by evolution, provide another example of subgoal stomp. Their terminal values, such as survival, health, social status, curiosity, etc., originally served instrumentally for the (implicit) goal of evolution, namely inclusive genetic fitness. Humans do not have inclusive genetic fitness as a goal: We are adaptation executors rather than fitness maximizers (Tooby and Cosmides, 1992).

If we consider evolution as an optimization process (though not, of course, as an agent), this represents a subgoal stomp.

See Also

External Links

References

Tooby, John, and Cosmides, Leda (1992) "The Psychological Foundations of Culture" in Jerome Barkow, Leda Cosmides, and John Tooby. The Adapted Mind: Evolutionary Psychology and the Generation of Culture. New York: Oxford.