Visible to the public Certified Multiplicative Weights Update, or Verified Learning Without Regret


The HCSS'17 Call asks how existing techniques can be used to build assurance cases for AI- and machine-learning-based systems. In this talk, I demonstrate that for certain methods in reinforcement learning, the subarea of AI in which an agent attempts to optimize a strategy over time against an adversarial environment, conventional tools such as interactive theorem provers suffice to prove both strong functional correctness and complexity guarantees.

As evidence, I'll present recent work from the Software Verification and Security Lab at Ohio University on a certified implementation of the Multiplicative Weights Update method, built in Ssreflect-Coq. MWU solves the general problem of "combining expert advice", in which an agent repeatedly chooses which action, or "expert", to play against an adaptive environment. The algorithm works by having the agent maintain a weighted distribution over the action space. As the costs of the actions played by the agent are learned over the course of repeated interactions with the environment, the agent updates its weighted action distribution with a linear or exponential loss function to penalize poorly performing actions, making it less likely such actions will be chosen in the future. This simple algorithm performs remarkably well: In number of rounds logarithmic in the size of the action space, MWU's expected cost approaches to within a small bound that of the best fixed action the agent could have chosen in hindsight (MWU is no regret).

The first part of the talk will introduce the MWU algorithm and present the high-level structure of our Ssreflect-Coq verification. The main takeaway here is that, while it may seem daunting to build assurance cases for AI methods like deep learning, algorithms with developed metatheory like MWU are perfectly amenable to conventional verification techniques such as proof by refinement.

The second part will draw connections to the broader problem of verification of machine-learning-based distributed systems via application of MWU to the approximate solution of multi-agent games. No-regret algorithms such as MWU converge, in expectation when played by multiple independent agents, to a large equilibrium class known as Coarse Correlated Equilibria. Although CCEs are not always optimal, it is possible -- for certain subclasses of games like Roughgarden's smooth games -- to bound their cost with respect to optimal. Our broader research program, the Cage project, seeks to use such results to build game-based systems like distributed network routers and load balancers that have verified convergence and correctness properties by design.


Gordon Stewart is an Assistant Professor in the School of Electrical Engineering and Computer Science at Ohio University. He received his Ph.D. in Computer Science in 2015 from Princeton University, where he worked with Andrew Appel on the Verified Software Toolchain project. In 2013, he was a recipient of the Princeton School of Engineering and Applied Science's Wu Prize for Excellence.

Gordon's research interests span programming languages, interactive theorem proving, and software security. Most recently, he's been working on applications of ideas from algorithmic game theory and machine learning to the construction of distributed systems with verified convergence and correctness guarantees. He gratefully acknowledges the support of both the National Science Foundation and the Ohio Federal Research Network.

Creative Commons 2.5

Other available formats:

Certified Multiplicative Weights Update, or Verified Learning Without Regret
Switch to experimental viewer