Skip to Content

Wireheading

tags
Book: Human Compatible
  • Wireheading is the process by which an agent tries to optimize its own reward function. (i.e. the one that is prewired)

  • Example: When rats were given a lever by which they were able to give themselves a dopamine boost, they kept at the lever till they collapsed

  • Thought: Are addictions a form of wireheading?

  • One way to solve wireheading in AGI is to remove the incentive to wirehead by separating the reward signal from the actual reward.

    • Question: How do we accomplish this separation?