Papers in Computer Science

Discussion of computer science publications

Archive for the ‘Formal verification’ Category

A Machine Program for Theorem-Proving

Posted by dcoetzee on March 14, 2009

Citation: Davis, M., Logemann, G., and Loveland, D. A machine program for theorem-proving. Communications of the ACM 5, 7 (Jul. 1962), 394-397. (PDF)

Abstract: The programming of a proof procedure is discussed in connection with trial runs and possible improvements.

Discussion: This short 1962 paper introduced the Davis-Logemann-Loveland (DLL) or Davis-Putnam-Logemann-Loveland (DPLL) algorithm, the basic algorithm for solving the boolean satisfiability (SAT) problem; it remains the foundation of all the most efficient SAT solvers used today. The paper shows it age with obsolete terminology and special attention to implementation details, making it difficult to read in some places, but the algorithm itself is straightforward to describe in modern terms.

In the boolean satisfiability problem, we are given a formula made up of three operations, AND, OR, and NOT, and a set of boolean variables, each of which may take on either the value true or false. The goal is to determine whether there is a particular assignment of true-false values to the variables that will make the formula as a whole true. For example, given this formula:

(x1 OR NOT(x2)) AND (NOT(x1) OR x2),

a satisfying assignment would be x1=true,x2=true. If a formula has no satisfying assignments, it is said to be unsatisfiable; the name of this paper comes from the useful fact that many simple theorems can be proven to be true by showing that a particular boolean formula is unsatisfiable.

The most naive algorithm for solving the problem is exhaustive search, trying every possible combination of truth values; but if there are n variables, there are 2n of these.

Another simple but more effective method is backtracking search: choose a value for one variable, plug it in the formula, and then simplify the formula. There are many ways to simplify the formula, but one obvious way is using boolean identities such as NOT true = false, NOT false = true, (x AND false) = false, and (x OR true) = true. Then choose a value for another variable and continue in this manner until the formula reduces to either true or false. If it reduces to true, a satisfying assignment has been found; otherwise, we backtrack to the last variable choice (that we have not already tried both values for) and switch it to the other value. A conceptual binary tree is constructed, with a variable at each node, and each edge representing an assignment of a value to a variable; this tree is called the search tree, and the search effectively performs an in-order traversal of it. The leaves are either true or false. If the formula is unsatisfiable, eventually it will walk the entire tree without encountering a true leaf.

The trouble with backtracking search is that if the variables are chosen in a poor order, the formula won’t reduce very much, the search tree will be large, and the runtime will still be similar to exhaustive search. To deal with this, the early Davis-Putnam (DP) method took a different approach: they observed that any boolean formula could be reduced to a canonical form called conjunctive normal form (CNF). In this form, the formula is written as an

  • A literal is defined as either a variable or its negative (NOT(x)).
  • A clause is defined as an OR (disjunction) over one or more literals.
  • The formula is written as an AND (conjunction) of clauses.

The example formula above was in this form. This form is convenient because it’s expressive enough to express many interesting constraints, but also simple enough to build more efficient algorithms for. The original techniques used by the authors to convert formulas to conjunctive normal form actually do not scale in some cases, but later it would be shown that there is a way to efficiently reduce any boolean formula to a formula in conjunctive normal form that isn’t too much larger.

Once the formula is in this form, we can improve backtracking search with an important heuristic called unit propagation: the idea is to search the formula for clauses containing only a single literal (either x or NOT(x)); these are called unit clauses. Because this literal must be true to satisfy the clause, and every clause must be true to satisfy the formula, the value of that variable is determined, so we can assign that variable a single value and simplify the formula. Unit clauses rarely appear in the original formula, but often pop up as we begin to assign some of the variables and simplify it. The effect compared to ordinary backtracking search is that the search tree has some nodes with only one child, reducing its average degree and the overall search time. This is essentially the DPLL algorithm in its simplest form.

The DPLL technique was intended as an improvement over the classical Davis-Putnam (DP) algorithm from 1960. The DP algorithm was also based on conjunctive normal form, but instead of performing a backtracking search, it eliminated a variable x by separating the clauses into three groups: the clauses A containing x (without negation), the clauses B containing NOT(x), and the clauses R containing neither. Then, the original formula is unsatisfiable if and only if (A OR B) AND R is unsatisfiable. Unfortunately, once this is reduced back to conjunctive normal form, it may become much longer; if this is done several times, the formula can become rapidly very long and unwieldly. DPLL provided an early example of a space-time tradeoff: instead of checking (A OR B) AND R for unsatisfiability, it checks first A AND R, then B AND R; both of these are already in conjunctive normal form, so there’s no formula size blowup. This was even more important on computers of the day with limited storage; computational complexity was not yet born then, so much of their discussion of this matter is informal.

Another technique introduced in DPLL that has since fallen out of favor is pure literal elimination: this says that if a variable is either never negated or always negated, it can always be assigned the value that makes all its literals true. This similarly reduces average search time, but only in cases where this occurs, which are rare in practice.

This simple scheme is amenable to many improvements: if there are no unit clauses, we still have great freedom in choosing which variables to reduce in which order; we can visit the nodes of the search tree in different orders than a simple in-order search, accelerating discovery of a satisfying assignment; we can perform more sophisticated simplification of the formula at each step, shrinking the tree size. All of these can dramatically affect overall running time, and together modern improvements like this have allowed SAT solvers to scale to formulas with millions of clauses. But because the SAT problem is NP-complete, there is no known general algorithm that is efficient on all instances; even the best methods will encounter instances on which they make poor variable choices and end up spending immense amounts of computing time searching a large search tree.

BPLL schemes were effective at the time because 1960s computers had a single CPU and were memory-starved. Many modern speedup efforts central around parallelization of the traversal of the search tree. An example of this is massively parallel SAT solvers based on FPGAs, which replicate a simple circuit many times to traverse part of the search space in parallel with many other simple solvers. As we move out of the age of increasing clock speed and into the age of parallel systems, solvers that can exploit parallelism effectively will become more important.

An interesting question to ask is: could there be an efficient variant of a scheme like the original BP that expands the formula at each step, but requires no backtracking? On modern systems with much greater memory and external storage capacities, this might be a win on certain instances. I haven’t seen any work along these lines.

The author releases all rights to all content herein and grants this work into the public domain, with the exception of works owned by others such as abstracts, quotations, and WordPress theme content.

Posted in Algorithms and optimization, Formal verification | Tagged: , , , , | 1 Comment »

Graph-based algorithms for Boolean function manipulation

Posted by dcoetzee on March 2, 2009

Citation: Bryant, R. E. 1986. Graph-Based Algorithms for Boolean Function Manipulation. IEEE Trans. Comput. 35, 8 (Aug. 1986), 677-691. (PDF)

Abstract: In this paper we present a new data structure for representing Boolean functions and an associated set of manipulation algorithms. Functions are represented by directed, acyclic graphs in a manner similar to the representations introduced by Lee [1] and Akers [2], but with further restrictions on the ordering of decision variables in the graph. Although a function requires, in the worst case, a graph of size exponential in the number of arguments, many of the functions encountered in typical applications have a more reasonable representation. Our algorithms have time complexity proportional to the sizes of the graphs being operated on, and hence are quite efficient as long as the graphs do not grow too large. We present experimental results from applying these algorithms to problems in logic design verification that demonstrate the practicality of our approach.

Discussion: This widely-cited 1986 paper introduced the notion of the reduced ordered binary decision diagram (ROBDD).

Let’s start with some background. A boolean function is a function that takes a set of boolean inputs (conventionally labelled x1 through xn) and produces a boolean (true/false) output. A binary decision diagram is a way of compactly representing a boolean function as a directed graph: each node tests a particular input parameter to see whether it’s true or false, and then proceeds to one of two child vertices depending on the result. Eventually, it reaches a terminal node labelled either true or false, and that’s the result.

BDDs can also be thought of as branching programs, functions that do nothing other than test boolean variables using if-else statements, return true or false, and call other boolean functions satisfying these constraints, as in this C++ example for the XOR function of 3 variables:

bool xor3(bool x1, bool x2, bool x3) {
    if (x1) {
        if (xor2(x2, x3))
            return false;
        else
            return true;
    } else {
        if (xor2(x2, x3))
            return true;
        else
            return false;
    }
}

bool xor2(bool x1, bool x2) {
    if (x1) {
        if (x2)
            return false;
        else
            return true;
    } else {
        if (x2)
            return true;
        else
            return false;
    }
}

BDDs have some nice properties: they can represent any boolean function, and they can compactly represent most boolean functions that occur in practice. It’s easy to complement them (just complement the terminal nodes) or combine them with AND, OR, or other boolean operations. Unfortunately, they’re so general that many problems involving them are computationally intractable. The diagram may contain cycles and fail to terminate, and some paths may be unreachable, as in these snippets:

bool nonterminating(bool x1) {
    if (x1) {
        if (nonterminating(x1)) {
            return true;
        } else {
            return false;
        }
    } else {
        return false;
    }
}

bool unreachable(bool x1, bool x2) {
    if (x1) {
        return false;
    } else {
        if (x1)
            return true;
        else
            return false;
    }
}

The above branching function unreachable() always returns false, since the “return true” statement is unreachable; but trying to determine in general whether or not a branching function always return false (satisfiability) is a difficult (NP-hard) problem. Likewise, trying to determine if two BDDs or branching programs compute the same boolean function is coNP-hard. Other difficult problems are generating a particular input that will cause a BDD to return true, or counting the number of inputs that make it return true.

BDDs are classical, dating back to a 1959 work by C. Y. Lee (Representation of switching circuits by binary-decision programs. Bell System Technical Journal, vol. 38, July 1959, pp 985-999). A 1978 work by S. B. Akers popularized BDDs and optimized them by reducing their size heuristically, but did not eliminate the issues described above (Binary decision diagrams. IEEE Transactions on Computers, Vol. C-27, No. 6, June 1978, pp. 509-516).

Bryant’s insight with reduced ordered BDDs (ROBDDs) is that by imposing additional constraints on BDDs, we can force them into a canonical form: any two ROBDDs representing the same boolean function will be identical. This means that it’s trivial to determine whether two ROBDDs are equivalent. This also makes the satisfiability problem simple (an ROBDD always return false only if it identical to the trivial ROBDD that always returns false). There are also efficient algorithms for ROBDDs to give a satisfying input (one that causes it to return true) and to count the number of satisfying inputs.

Compared to general BDDs, the main limitation of ROBDDs is that some boolean functions will require much larger graphs; in fact, any ROBDD describing integer multiplication functions will necessarily be of exponential size, as later proven by Bryant in 1991 (On the complexity of VLSI implementations and graph representations of boolean functions with application to integer multiplication. IEEE Transactions on Computers 40-2, pp.205-213).

The first constraint, the “ordered” part of reduced ordered BDDs, limits the order in which input parameters can be tested: an ROBDD cannot test an input multiple times, and must test inputs in increasing order by index. For example, if it has already tested x2, it can’t go back and test x1. This implies that cycles and nontermination can’t exist (an ROBDD is a directed acyclic graph).

A side effect of the ordered constraint is that the size and complexity of the representation depends critically on the ordering of the inputs; because the runtime of the algorithms that manipulate ROBDDs depends on the size of the graph, it’s important to choose a good ordering. Finding an optimal ordering is coNP-complete, so heuristics have to be used for this.

The second constraint, the “reduced” part of reduced order BDDs, specifies that the BDD graph must have the smallest possible number of nodes needed to represent the function. Because of the ordering constraint, there is an efficient algorithm to do this (section 4.2). Let the subgraph rooted at v be the set of all nodes reachable from node v by directed edges. A reduced BDD has two properties: 1. no node transitions to the same node regardless of whether the variable tested is true or false; 2. there are no two nodes such that the subgraphs rooted at those nodes are ismorphic. In the first case, because the test has no influence on the result, the test can simply be removed and all incoming edges directed to the next state. In the second case, all incoming edges into the two nodes with ismorphic subgraphs can be redirected to just one of them, and then any nodes with in-degree zero can be removed. To identify isomorphic subgraphs in the first place, the graph isomorphism algorithm for directed acyclic graphs is used.

ROBDDs are a general data structure, but they’re most useful in the contexts of logic synthesis and formal verification. In logic synthesis, their small size helps to generate a small logic circuit to implement a given boolean function. In formal verification, one can construct a formula representing whether a system has a desired property as a BDD, and then see whether that formula is ever false; if it is, it can given an example of a specific situation that makes it false and facilitates diagnosis. In this context, the ability of BDDs to describe a large, complex boolean function implicitly and succinctly is of benefit.

A large part of the effort going into this work (Section 5) was demonstrating that good input orderings could be found that heuristically would tend to produce small ROBDDs. Bryant sought to demonstrate this empirically by applying them to a variety of practical scenarios; in this paper, he limited his evaluation to “the problem of verifying that the implementation of a logic function (in terms of a combinatorial logic gate network) satisfies its specification (in terms of boolean expressions).” Essentially he verified that the addition and logical operation circuits of some arithmetic logic units (ALUs) were correct. The resulting verification took only a few minutes, even on computers of the time, but was only efficient “as long as we choose an ordering in which the successive bits of the input words are interleaved.” In 1992 he would follow up with a 26-page survey going into considerably more detail about specific implementation techniques, encoding of systems as boolean formulas, and applications in digital system design, finite-state system analysis, and truth maintenance systems (Symbolic Boolean manipulation with ordered binary-decision diagrams. ACM Comput. Surv. 24, 3 (Sep. 1992), 293-318.)

A number of alternatives to BDDs in system verification have been proposed; one of the most popular, propositional decision procedures, was evaluated by Biere, Cimatti, and Clarke in 1999 ( Biere, A., Cimatti, A., Clarke, E. M., and Zhu, Y. Symbolic Model Checking without BDDs. In Proceedings of the 5th international Conference on Tools and Algorithms For Construction and Analysis of Systems. PDF). This construction takes advantage of the fact that most verification procedures can be reduced to the problem of determining whether a given boolean formula is satisfiable; once this formula is generated, a boolean satisfiability algorithm (or SAT solver) can be applied to systematically search for an input that makes the formula true. This creates a time-space tradeoff: BDDs rapidly exceed space constraints for certain problems or when the input order is not chosen correctly, but as long as the graphs remain small they can determine satisfiability efficiently. SAT solvers allow smaller representations, but cannot reliably produce solutions for all possible formulas because the boolean satisfiability problem is NP-complete. Nevertheless, as SAT solvers have become more sophisticated and CPUs have become faster, this technique has begun to rapidly dominate the traditional approach of using BDDs.

The author releases all rights to all content herein and grants this work into the public domain, with the exception of works owned by others such as abstracts, quotations, and WordPress theme content.

Posted in Formal verification | Tagged: , , , | 1 Comment »