Scalable agent architecture for distributed training

Source Node: 752347

The tasks are designed to be as varied as possible. They differ in the goals they target, from learning, to memory, to navigation. They vary visually, from brightly coloured, modern-styled texture, to the subtle brown and greens of a desert at dawn, midday, or by night. And they contain physically different settings, from open, mountainous terrain, to right-angled mazes, to open, circular rooms.

In addition, some of the environments include ‘bots’, with their own, internal, goal-oriented behaviours. Equally importantly, the goals and rewards differ across the different levels, from following language commands and using keys to open doors, foraging mushrooms, to plotting and following a complex irreversible path.

However, at a basic level, the environments are all the same in terms of their action and observation space allowing a single agent to be trained to act in every environment in this highly varied set. More details about the environments can be found on the DeepMind Lab GitHub page.

Importance-Weighted Actor-Learner Architectures

In order to tackle the challenging DMLab-30 suite, we developed a new distributed agent called Importance Weighted Actor-Learner Architecture that maximises data throughput using an efficient distributed architecture with TensorFlow.

Importance Weighted Actor-Learner Architecture is inspired by the popular A3C architecture which uses multiple distributed actors to learn the agent’s parameters. In models like this, each of the actors uses a clone of the policy parameters to act in the environment. Periodically, actors pause their exploration to share the gradients they have computed with a central parameter server that applies updates (see figure below).

Source: https://deepmind.com/blog/article/impala-scalable-distributed-deeprl-dmlab-30

Time Stamp:

More from Deep Mind - Latest Post