Visible to the public Proof Robustness in ACL2


This experience report builds on 20 years of maintaining proofs for the ACL2 theorem prover.

Keeping old proofs working has many benefits. It let us start using, extending, or adapting them any time, without painful modernization to get them working with the latest version of the prover. And proofs about new versions of software and hardware artifacts are easier if one has access to working proofs of the old versions. Having a large suite of working proofs also helps evaluate proposed changes to supporting libraries and to the prover itself.

The ACL2 Community maintains a large and growing repository of proof libraries on GitHub ( It contains, very roughly, about 90,000 function definitions and about 174,000 theorems. These libraries are actively being extended and improved, at the rate of about 10 new commits per day. Kestrel Institute and other major users of ACL2 maintain their own large proof libraries. (Kestrel's internal proofs include many derivations of correct-by-construction software, many proofs about Java programs, etc.)

We endeavor to keep all of these proofs working, even as changes are made to 1) the precise definitions supporting the proofs (e.g., formal models, specifications, and deeply or shallowly embedded software or hardware artifacts), 2) the libraries of rules used in the proofs, 3) any supporting proof tools (connections to SMT solvers, custom theorem provers), and 4) ACL2 itself. Keeping proofs working requires infrastructure support, such as single commands to re-build large sets of proofs. Changes to the ACL2 libraries on Github must pass automated regression tests (building almost everything in the repository) before they are automatically merged into the master branch. Replaying all of our proofs takes a few hours, even on a machine with dozens of cores, but file-level dependency analysis prevents unaffected material from being rebuilt.

Automated build systems prevent "bit rot" (proof rot?), allowing new proof failures to be detected quickly. Diagnosing a proof that worked yesterday but fails today is often easy, since not much has changed. It can be much harder to un-break a proof after hundreds of changes have been made to the prover and its libraries.

Given all this, the ACL2 Community has developed an informal set of best practices for writing and maintaining robust proofs. ACL2 proofs are done by supplying hints to the prover, but certain constructs lead to brittle proofs and should be avoided when possible. These include 1) hints that depend on the detailed structure of the proof, and 2) hints that depend on the detailed structure of the logical terms involved. The former include hints attached to subgoals (such as "Subgoal 3.4", indicating the fourth subgoal generated from the third top-level goal generated from the conjecture). These are brittle because the precise decomposition of the proof into subgoals is likely to change. In many cases, such hints can simply be attached to "Goal" instead. Examples of #2 above include hints that mention particular pieces of syntax, such as terms given to instantiate a lemma, or to split a proof into cases. These will break if function names change, or if arguments are added, removed, or reordered. Similarly, Proof Builder instructions (like "go to the third argument of the fifth hypothesis") can also be brittle. Theory hints ("turn this rule on for the entire proof") tend to be a bit more robust. It is often helpful to replace detailed hints with explicit rewrite rules that will simplify patterns everywhere they appear, in the current proof and in all future proofs. ACL2 encourages this style of development.

We also seek robustness in proofs that are generated by tools, ensuring they will work in any future context. Here, the key techniques include very tightly controlling the set of rules and definitions used, making induction schemes explicit, and disabling ACL2 proof techniques that are not expected to contribute to the proof.

On a higher level, we advocate a lightweight approach to structuring files and directories of proofs. This involves limiting what is included into a development, to limit the effect of changes. Crucially, we also try to limit what is exported by each file of proofs. For example, if a development requires arithmetic reasoning but is not itself about arithmetic, it should not force its users to include the arithmetic library that it happens to use. ACL2's "local include" construct supports this.

Still, sometimes a proof fails after a change. In such cases, we can almost always get it working again. Having access to a working version of the proof (in an older checkout of the repository), or at least to the prover's output from a working version, can be a great help. One can compare the human-readable commentary printed for the working and failing proofs. Things to check include the induction scheme used, whether the same definitions were expanded, and whether the same rules were used. If more detail is required, the proofs can be re-run in a verbose mode. Individual subgoals from the failing proof can be even submitted into the old session, to see if they can be proved in the old environment. If rules that applied in the old proof now fail to fire, ACL2's "monitor" feature can help determine why. If there are rules that fire in the new proof but did not fire in the old one, one might need to disable them, but a better approach might be to formulate additional rules that get the proof working again.

Dr. Eric W. Smith is the CEO of Kestrel Institute, where he does research in formal methods. He co-leads Kestrel's APT (Automated Program Transformations) project, which is developing proof-emitting program transformations for the ACL2 theorem prover. APT is currently being used to synthesize network protocol implementations and high-assurance monitors for machine learning algorithms. Dr. Smith also works on the Axe tool originally written for his Stanford Ph.D. thesis. Axe can lift Java bytecode and x86 binary programs into a logical representation which is then verified using Axe's high-performance prover and equivalence checker. A newly released version of Axe can verify large rank-1 constraint systems used in zero-knowledge proofs. Dr. Smith led Kestrel's DerivationMiner effort, which used program synthesis techniques (derivations and refinements) to construct correctness proofs of code found in large online repositories. He also led Kestrel's DARPA APAC effort to find malware in Android apps and formally prove the correctness of apps without malware.

Creative Commons 2.5

Other available formats:

Proof Robustness in ACL2
Switch to experimental viewer