Learning through human feedback

Source Node: 796313

In the Atari game Enduro, which involves steering a car to overtake a line of others and is very difficult to learn by the trial and error techniques of a traditional RL network, human feedback eventually allowed our system to achieve superhuman results. In other games and simulated robotics tasks, it performed comparably to a standard RL set-up, while in a couple of games like Qbert and Breakout it failed to work at all.

But the ultimate purpose of a system like this is to allow humans to specify a goal for the agent, even if it is not present in the environment. To test this, we taught agents various novel behaviours such as performing a backflip, walking on one leg or learning to driving alongside another car in Enduro, rather than overtake to maximise the game score.

Source: https://deepmind.com/blog/article/learning-through-human-feedback

Time Stamp:

More from Deep Mind - Latest Post