The Mechanically Verified Stack Challenge

J Strother Moore
Inman Chair and Chairman

Department of Computer Sciences
University of Texas at Austin
The Premise

The formal methods community is in danger of settling into comfortable niches far smaller than our potential suggests.
To maximize the impact of our science, we must demonstrate that mechanized formal methods can dramatically decrease time-to-market while producing correct systems.

This requires

\emph{pervasive use of formal methods throughout the abstraction hierarchy.}
The Challenge

Each major group in the formal methods community should design and mechanically verify a practical embedded system, from transistors to software.
A Personal Perspective

Instead of debugging a program, one should prove that it meets its specifications, and this proof should be checked by a computer program.

Boyer-Moore Project

McCarthy’s “Theory of Computation”
Edinburgh Pure Lisp Theorem Prover
A Computational Logic
NQTHM
ACL2

Boyer
Moore
Kaufmann
simple list processing

academic math and cs

commercial applications
1970s

(defun append (x y)
  (if (endp x)
      y
      (cons (car x)
            (append (cdr x) y))))

(theorem
  (equal (append (append a b) c)
         (append a (append b c))))
1980s Academic Math

- invertibility of RSA encryption
- undecidability of the halting problem
- Gödel’s First Incompleteness Theorem (Shankar)
- Gauss’ law of quadratic reciprocity (Russinoff)
1980s Academic CS

- microprocessor: gates to machine code (Hunt)
- assembler-linker-loader (Moore and Kaufmann)
- compiler (Young)
The CLI Stack

Procedure Mult(var loop
if K le 0
var K: int:= 0;
}

formal models related by mechanically checked proofs

0111010100011
0010010000011
01110100100111
0011101001001010
0101110111110011

FM9001 device

FM9001 machine code

Piton assembly language

Micro-Gypsy

INPUTS A, B, C;
OUTPUTS SUM, CARR LEVEL FUNCTION;
DEFINE T0 (SUM1, CARRY1) = H (SUM, CARRY2) =

die plot produced by LSI Logic, Inc, from verified NDL via conventional CAD tools

Micro-Gypsy

Piton assembly language

FM9001 machine code

Formal NDL netlist

fabricated FM9001 device

formal models related by mechanically checked proofs

0111010100011
0010010000011
01110100100111
0011101001001010
0101110111110011

die plot produced by LSI Logic, Inc, from verified NDL via conventional CAD tools
Additional Stack Components

- operating system (Bevier)
- Lisp compiler (Flatau)
- applications (Wilding)
• Applications were mechanically verified with respect to the high-level languages.

• Correct binary images were produced by mechanically verified tools.
  – compile
  – assemble
  – link
  – load
What’s Missing?

- The FM9001 was modeled at the gate level.

- The FM9001 had an unrealistically simple architecture: memory-mapped io, no pipeline, no speculation, no floating point, crude interrupts, no cache.
• None of the programming languages had io or floating point.

• The high-level languages were too simple to be of practical use.
  – The micro-Gypsy language had very few primitives and no dynamically allocated data types (e.g., records).
  – The Pure Lisp had automatic storage
allocation (e.g., cons) but no verified garbage collector.

- The compilers, assembler, linker, and loader were “cross-platform” transformers.

- No useful application programs were verified.
• The operating system was not hosted on the FM9001.
What Was Learned?

System verification does not require super-human skills.

It requires better tools than we had in 1985.
simple list processing

academic math and cs

commercial applications


stack project

ACL2 development
1990s

- Motorola 68020 and Berkeley C String Library
- Motorola CAP DSP
- AMD K5 FDIV
- AMD Athlon fp
- Rockwell Collins JEM1
- IBM Cryptographic coprocessor
- Union Switch & Signal and Argonne Nat’l Labs checkers
- FM9801
- JVM
- Java classes
- IBM Array Verification Tool
- Rockwell Collins CAPS
An Imaginary Stack

- applications
- Java
- JVM
- ISA
- microarchitecture
- gates
- transistors
applications
Java
JVM
ISA
microarchitecture
gates
transistors
class Container {
    public int counter; }

class Job extends Thread {
    Container objref;
    public Job incr () {
        synchronized(objref) {
            objref.counter = objref.counter + 1; }
        return this; }
    public void setref(Container o) {
        objref = o; }
    public void run() {
        for (;;) { incr(); } }
}
class Apprentice {
    public static void main(String[] args) {
        Container container = new Container();
        for (;;) {
            Job job = new Job();
            job.setref(container);
            job.start();
        }
    }
}
Java (Golden, Krug, Liu, Moore, Porter)

; JVM in ACL2
; J Moore and George Porter

(defun make-state (tt hp ct) ...) 

(defun step (th s) ...) 

(defun run (sched s) 
  (if (endp sched) s 
    (run (cdr sched) 
      (step (car sched) s)))))
(defun run (schedule state)
  (if (endp schedule)
      state
      (run (cdr schedule)
           (step (car schedule) state)))))
Our State: \(< tt, hp, ct \)
(defun step (th s)
   (if (equal (call-stack-status th s) 'SCHEDULED)
       (do-inst (next-inst th s) th s) s))
(defun do-inst (inst th s)
  (case (op-code inst)
    (AALOAD  (execute-AALOAD inst th s))
    (AASTORE (execute-AASTORE inst th s))
    (ALOAD   (execute-ALOAD inst th s))
    (ALOAD_0 (execute-ALOAD_X inst th s 0))
    (ALOAD_1 (execute-ALOAD_X inst th s 1))
    (ALOAD_2 (execute-ALOAD_X inst th s 2))
    (ALOAD_3 (execute-ALOAD_X inst th s 3))
    ...)))
(defun execute-AALOAD (inst th s)
  (let* ((frame (top-frame th s))
         (index (top (stack frame)))
         (aref (top (pop (stack frame))))
         (array (dereference aref (heap s))))
    (modify th s
      :pc (+ (inst-length inst) (pc frame))
      :stack
      (push (element-at index array)
        (pop (pop (stack frame)))))))
Some Java/JVM/ACL2 Code

public static int fact(int n) {
    if (n <= 0) return 1;
    else return n*fact(n-1);}

javac
Method int fact(int)
  0 iload_1
  1 ifgt 5
  4 iconst_1

jvm2acl2
("fact" (INT) nil (ILOAD_1) (IFGT 5) (ICONST_1))
```
"fact(5)=120"
```

```
"fact(n)=n!"
```
applications
Java
JVM
ISA
microarchitecture
gates
transistors
The world’s first silicon Java Virtual Machine was first modeled in ACL2.
The formal model is executable.

It was used in place of a C simulator for requirements and certification testing.

It runs at 90% the speed of the C simulator.
Motorola CAP DSP (Brock)
ROM containing 50 microcoded DSP algorithms

Pipelined microarchitecture

Sequential microcode ISA

(if no hazards)
Rockwell Collins Avionics

Microprocessor A

==

Microprocessor B

microcode
ROM
**AMD K5 Algorithm** \(FDIV(p, d, mode)\)

1. \(sd_0 = \text{lookup}(d)\) \[\text{exact 17 8}\]
2. \(d_r = d\) \[\text{away 17 32}\]
3. \(sdd_0 = sd_0 \times d_r\) \[\text{away 17 32}\]
4. \(sd_1 = sd_0 \times \text{comp}(sdd_0, 32)\) \[\text{trunc 17 32}\]
5. \(sdd_1 = sd_1 \times d_r\) \[\text{away 17 32}\]
6. \(sd_2 = sd_1 \times \text{comp}(sdd_1, 32)\) \[\text{trunc 17 32}\]

... \[\text{... = ...}\]

29. \(q_3 = sd_2 \times ph_3\) \[\text{trunc 17 24}\]
30. \(qq_2 = q_2 + q_3\) \[\text{sticky 17 64}\]
31. \(qq_1 = qq_2 + q_1\) \[\text{sticky 17 64}\]
32. \(fdiv = qq_1 + q_0\) \(mode\)
Using the Reciprocal

\[
\begin{align*}
36. & \\
+ & -1.7 \\
+ & 0.034 \\
+ & -0.00066 \\
35.833334 \hfill & \text{Reciprocal Calculation:} \\
\sqrt{12} & 430.000000 \\
432. & \text{Quotient Digit Calculation:} \\
-2. & 0.083 \times 430.0000 = 35.6900000 \approx 36.000000 = q_0 \\
-2.04 & 0.083 \times -2.0000 = -166000 \approx -170000 = q_1 \\
-2.04 & 0.083 \times 0.0400 = 0.003200 \approx 0.003400 = q_2 \\
-2.04 & 0.083 \times -0.008 = -0.000664 \approx -0.00067 = q_3 \\
0.04 & \text{Summation of Quotient Digits:} \\
0.408 & q_0 + q_1 + q_2 + q_3 = 35.833333 \\
-0.008 & \hfill \\
-0.0008 & \\
-0.000792 & \\
-0.000008 & \\
\end{align*}
\]
Computing the Reciprocal

\[ y = \frac{1}{x} - d \]

\[ \frac{dy}{dx} = -x^{-2} \]

\[ sd_{i+1} = sd_i \left(2 - sd_i \cdot d\right) \]
<table>
<thead>
<tr>
<th>top 8 bits of $d$</th>
<th>approx inverse</th>
<th>top 8 bits of $d$</th>
<th>approx inverse</th>
<th>top 8 bits of $d$</th>
<th>approx inverse</th>
<th>top 8 bits of $d$</th>
<th>approx inverse</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.00000002</td>
<td>0.1111111112</td>
<td>1.01000002</td>
<td>0.110011002</td>
<td>1.10000002</td>
<td>0.101010102</td>
<td>1.11000002</td>
<td>0.100100102</td>
</tr>
<tr>
<td>1.00000012</td>
<td>0.1111111112</td>
<td>1.01000012</td>
<td>0.110010112</td>
<td>1.10000012</td>
<td>0.101010012</td>
<td>1.11000012</td>
<td>0.100100012</td>
</tr>
<tr>
<td>1.00000102</td>
<td>0.1111111112</td>
<td>1.01000102</td>
<td>0.110010012</td>
<td>1.10000102</td>
<td>0.101010002</td>
<td>1.11000102</td>
<td>0.100100012</td>
</tr>
<tr>
<td>1.00000112</td>
<td>0.1111101012</td>
<td>1.01001012</td>
<td>0.110001102</td>
<td>1.10001002</td>
<td>0.101001112</td>
<td>1.11010012</td>
<td>0.101011112</td>
</tr>
<tr>
<td>1.00001002</td>
<td>0.1111100112</td>
<td>1.01010012</td>
<td>0.110000112</td>
<td>1.10010012</td>
<td>0.101001112</td>
<td>1.11100012</td>
<td>0.10111112</td>
</tr>
<tr>
<td>1.00001102</td>
<td>0.1111010012</td>
<td>1.01010102</td>
<td>0.110000012</td>
<td>1.10100102</td>
<td>0.101001012</td>
<td>1.11110012</td>
<td>0.101111012</td>
</tr>
<tr>
<td>1.00001012</td>
<td>0.1111001112</td>
<td>1.01010102</td>
<td>0.110000002</td>
<td>1.10100102</td>
<td>0.101001012</td>
<td>1.11111012</td>
<td>0.101111012</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1.00101012</td>
<td>0.1101011012</td>
<td>1.01011012</td>
<td>0.101010012</td>
<td>1.10101012</td>
<td>0.100110012</td>
<td>1.11110012</td>
<td>0.100001012</td>
</tr>
<tr>
<td>1.00101112</td>
<td>0.1101100012</td>
<td>1.01101112</td>
<td>0.101100112</td>
<td>1.10110012</td>
<td>0.100101112</td>
<td>1.11111012</td>
<td>0.100001112</td>
</tr>
<tr>
<td>1.00110012</td>
<td>0.1101011112</td>
<td>1.01110012</td>
<td>0.101100012</td>
<td>1.10110102</td>
<td>0.100101112</td>
<td>1.11111002</td>
<td>0.100001012</td>
</tr>
<tr>
<td>1.00110102</td>
<td>0.1101001012</td>
<td>1.01110102</td>
<td>0.101010002</td>
<td>1.10110102</td>
<td>0.100101012</td>
<td>1.11111002</td>
<td>0.1000010012</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>1.00111012</td>
<td>0.1100100012</td>
<td>1.01111012</td>
<td>0.101010112</td>
<td>1.10111002</td>
<td>0.100101012</td>
<td>1.11111102</td>
<td>0.100000012</td>
</tr>
<tr>
<td>1.00111112</td>
<td>0.1100110012</td>
<td>1.01111112</td>
<td>0.101010012</td>
<td>1.10111102</td>
<td>0.100101012</td>
<td>1.11111112</td>
<td>0.1000000012</td>
</tr>
</tbody>
</table>
The Formal Model of the Code

(defun FDIV (p d mode)
  (let*
    ((sd0 (eround (lookup d)) '(exact 17 8))
     (dr (eround d) '(away 17 32))
     (sdd0 (eround (* sd0 dr)) '(away 17 32))
     (sd1 (eround (* sd0 (comp sdd0 32))) '(trunc 17 32))
     (sdd1 (eround (* sd1 dr)) '(away 17 32))
     (sd2 (eround (* sd1 (comp sdd1 32))) '(trunc 17 32))
...
     (qq2 (eround (+ q2 q3)) '(sticky 17 64))
     (qq1 (eround (+ qq2 q1)) '(sticky 17 64))
     (fdiv (round (+ qq1 q0) mode))
    (or (first-error sd0 dr sdd0 sd1 sdd1 ... fdiv)
        fdiv)))
The K5 FDIV Theorem

(defthm FDIV-divides
  (implies (and (floating-point-numberp p 15 64)
                (floating-point-numberp d 15 64)
                (not (equal d 0))
                (rounding-modep mode))
            (equal (FDIV p d mode)
                   (round (/ p d) mode))))

(by Moore, Lynch and Kaufmann, in 1995, before the K5 was fabricated)
AMD ATHLON™ ROCKETS TO 1GHz
All elementary floating-point operations on the **AMD Athlon** were

- specified in ACL2 to be IEEE compliant,
- proved to meet their specifications, and
- the proofs were checked mechanically – before fab.

(Russinoff and Flatau)
module FMUL;  // sanitized from AMD Athlon(TM)
       // by David Russinoff and Art Flatau
/********************************************************/
// Declarations
/********************************************************/
//Precision and rounding control:
‘define SNG 1'b0    // single precision
‘define DBL 1'b1    // double precision
‘define NRE 2'b00   // round to nearest
‘define NEG 2'b01   // round to minus infinity
‘define POS 2'b10   // round to plus infinity
// Parameters:
input x[79:0];       // first operand
input y[79:0];       // second operand
input rc[1:0];       // rounding control
input pc;            // precision control
output z[79:0];      // rounded product

// First Cycle

// Operand fields:
sgnx = x[79]; sngy = y[79];
expx[14:0] = x[78:64]; expy[14:0] = y[78:64];
RTL \rightarrow \text{proofs} \quad \text{ACL2} \quad \text{AMD} \quad \text{RTL sim} \quad \text{fabrication}
Summary of Our Current State

We have mechanically verified

- RTL for industrial designs,
- commercial microcode, and
- simple programs in a widely used programming language.
In each industrial site where ACL2 is being used, proprietary formal information has been created:

- models of artifacts,
- definition of “in-house” concepts,
- specifications, and
- lemmas and proof strategies.
Our lowest level formal models are usually

- “bit accurate,”
- “cycle accurate,” and
- efficiently executable (comparable to C).
Our formal models often replace conventional simulation models:

- they are as accurate and
- they run about as fast, but
- they can be formally analyzed.

Formal models add value.
The evidence supports

- the value-added proposition,
- the claim that formal methods requires less manpower than testing, and
- the claim that formal methods reduces time-to-market.
We have certified books (lemma libraries) for

- arithmetic and bit-vectors,
- stuttering bi-simulation,
- a BDD package (60% CUDD speed), and
- a sound and complete mu-calculus model checker.
The Challenge

Each major group in the formal methods community should design and mechanically verify a practical embedded system, from transistors to software.
Why Build a Stack?

- There is still much to be learned.
- Practice makes perfect (more accurately, practice encourages mechanization).
- A stack forces you to deliver what you assume.
The Rules of the Game

- The project should produce an artifact, e.g., a chip.

- The artifact’s behavior should be of interest to people not in formal methods.

- The artifact should come with a “warranty” expressed as a theorem.
• The warranty should be certified mechanically; user input and interaction are allowed but a machine must be responsible for the soundness of the claim.

• The machine used to certify the warranty should be available for others, at least, to use and test, if not inspect.
Why It is Hard

There are unsolved technical problems.

But mainly:

None of our systems allow us to move up and down the abstraction hierarchy with the ease required to build a practical system.
The Challenge Will Encourage

• more automation,

• integration of formal approaches,

• development of collaborative environments,
• identification of assumptions,
• specification of crucial system interfaces, and
• elegant designs of practical systems.