Identifying and eliminating bugs in learned predictive models

Source Node: 749906

This is not an entirely new problem. Computer programs have always had bugs. Over decades, software engineers have assembled an impressive toolkit of techniques, ranging from unit testing to formal verification. These methods work well on traditional software, but adapting these approaches to rigorously test machine learning models like neural networks is extremely challenging due to the scale and lack of structure in these models, which may contain hundreds of millions of parameters. This necessitates the need for developing novel approaches for ensuring that machine learning systems are robust at deployment.

From a programmer’s perspective, a bug is any behaviour that is inconsistent with the specification, i.e. the intended functionality, of a system. As part of our mission of solving intelligence, we conduct research into techniques for evaluating whether machine learning systems are consistent not only with the train and test set, but also with a list of specifications describing desirable properties of a system. Such properties might include robustness to sufficiently small perturbations in inputs, safety constraints to avoid catastrophic failures, or producing predictions consistent with the laws of physics.

In this article, we discuss three important technical challenges for the machine learning community to take on, as we collectively work towards rigorous development and deployment of machine learning systems that are reliably consistent with desired specifications:

  1. Testing consistency with specifications efficiently. We explore efficient ways to test that machine learning systems are consistent with properties (such as invariance or robustness) desired by the designer and users of the system. One approach to uncover cases where the model might be inconsistent with the desired behaviour is to systematically search for worst-case outcomes during evaluation.

  2. Training machine learning models to be specification-consistent. Even with copious training data, standard machine learning algorithms can produce predictive models that make predictions inconsistent with desirable specifications like robustness or fairness – this requires us to reconsider training algorithms that produce models that not only fit training data well, but also are consistent with a list of specifications.

  3. Formally proving that machine learning models are specification-consistent. There is a need for algorithms that can verify that the model predictions are provably consistent with a specification of interest for all possible inputs. While the field of formal verification has studied such algorithms for several decades, these approaches do not easily scale to modern deep learning systems despite impressive progress.

Testing consistency with specifications

Robustness to adversarial examples is a relatively well-studied problem in deep learning. One major theme that has come out of this work is the importance of evaluating against strong attacks, and designing transparent models which can be efficiently analysed. Alongside other researchers from the community, we have found that many models appear robust when evaluated against weak adversaries. However, they show essentially 0% adversarial accuracy when evaluated against stronger adversaries (Athalye et al., 2018, Uesato et al., 2018, Carlini and Wagner, 2017).

While most work has focused on rare failures in the context of supervised learning (largely image classification), there is a need to extend these ideas to other settings. In recent work on adversarial approaches for uncovering catastrophic failures, we apply these ideas towards testing reinforcement learning agents intended for use in safety-critical settings. One challenge in developing autonomous systems is that because a single mistake may have large consequences, very small failure probabilities are unacceptable.

Our objective is to design an “adversary” to allow us to detect such failures in advance (e.g., in a controlled environment). If the adversary can efficiently identify the worst-case input for a given model, this allows us to catch rare failure cases before deploying a model. As with image classifiers, evaluating against a weak adversary provides a false sense of security during deployment. This is similar to the software practice of red-teaming, though extends beyond failures caused by malicious adversaries, and also includes failures which arise naturally, for example due to lack of generalization.

We developed two complementary approaches for adversarial testing of RL agents. In the first, we use a derivative-free optimisation to directly minimise the expected reward of an agent. In the second, we learn an adversarial value function which predicts from experience which situations are most likely to cause failures for the agent. We then use this learned function for optimisation to focus the evaluation on the most problematic inputs. These approaches form only a small part of a rich, growing space of potential algorithms, and we are excited about future development in rigorous evaluation of agents.

Already, both approaches result in large improvements over random testing. Using our method, failures that would have taken days to uncover, or even gone undetected entirely, can be detected in minutes (Uesato et al., 2018b). We also found that adversarial testing may uncover qualitatively different behaviour in our agents from what might be expected from evaluation on a random test set. In particular, using adversarial environment construction we found that agents performing a 3D navigation task, which match human-level performance on average, still failed to find the goal completely on surprisingly simple mazes (Ruderman et al., 2018). Our work also highlights that we need to design systems that are secure against natural failures, not only against adversaries.


Time Stamp:

More from Deep Mind - Latest Post