A Machine Program for Theorem-Proving
Posted by dcoetzee on 14th March 2009
Citation: Davis, M., Logemann, G., and Loveland, D. A machine program for theorem-proving. Communications of the ACM 5, 7 (Jul. 1962), 394-397. (PDF)
Abstract: The programming of a proof procedure is discussed in connection with trial runs and possible improvements.
Discussion: This short 1962 paper introduced the Davis-Logemann-Loveland (DLL) or Davis-Putnam-Logemann-Loveland (DPLL) algorithm, the basic algorithm for solving the boolean satisfiability (SAT) problem; it remains the foundation of all the most efficient SAT solvers used today. The paper shows it age with obsolete terminology and special attention to implementation details, making it difficult to read in some places, but the algorithm itself is straightforward to describe in modern terms.
In the boolean satisfiability problem, we are given a formula made up of three operations, AND, OR, and NOT, and a set of boolean variables, each of which may take on either the value true or false. The goal is to determine whether there is a particular assignment of true-false values to the variables that will make the formula as a whole true. For example, given this formula:
(x1 OR NOT(x2)) AND (NOT(x1) OR x2),
a satisfying assignment would be x1=true,x2=true. If a formula has no satisfying assignments, it is said to be unsatisfiable; the name of this paper comes from the useful fact that many simple theorems can be proven to be true by showing that a particular boolean formula is unsatisfiable.
The most naive algorithm for solving the problem is exhaustive search, trying every possible combination of truth values; but if there are n variables, there are 2n of these.
Another simple but more effective method is backtracking search: choose a value for one variable, plug it in the formula, and then simplify the formula. There are many ways to simplify the formula, but one obvious way is using boolean identities such as NOT true = false, NOT false = true, (x AND false) = false, and (x OR true) = true. Then choose a value for another variable and continue in this manner until the formula reduces to either true or false. If it reduces to true, a satisfying assignment has been found; otherwise, we backtrack to the last variable choice (that we have not already tried both values for) and switch it to the other value. A conceptual binary tree is constructed, with a variable at each node, and each edge representing an assignment of a value to a variable; this tree is called the search tree, and the search effectively performs an in-order traversal of it. The leaves are either true or false. If the formula is unsatisfiable, eventually it will walk the entire tree without encountering a true leaf.
The trouble with backtracking search is that if the variables are chosen in a poor order, the formula won’t reduce very much, the search tree will be large, and the runtime will still be similar to exhaustive search. To deal with this, the early Davis-Putnam (DP) method took a different approach: they observed that any boolean formula could be reduced to a canonical form called conjunctive normal form (CNF). In this form, the formula is written as an
- A literal is defined as either a variable or its negative (NOT(x)).
- A clause is defined as an OR (disjunction) over one or more literals.
- The formula is written as an AND (conjunction) of clauses.
The example formula above was in this form. This form is convenient because it’s expressive enough to express many interesting constraints, but also simple enough to build more efficient algorithms for. The original techniques used by the authors to convert formulas to conjunctive normal form actually do not scale in some cases, but later it would be shown that there is a way to efficiently reduce any boolean formula to a formula in conjunctive normal form that isn’t too much larger.
Once the formula is in this form, we can improve backtracking search with an important heuristic called unit propagation: the idea is to search the formula for clauses containing only a single literal (either x or NOT(x)); these are called unit clauses. Because this literal must be true to satisfy the clause, and every clause must be true to satisfy the formula, the value of that variable is determined, so we can assign that variable a single value and simplify the formula. Unit clauses rarely appear in the original formula, but often pop up as we begin to assign some of the variables and simplify it. The effect compared to ordinary backtracking search is that the search tree has some nodes with only one child, reducing its average degree and the overall search time. This is essentially the DPLL algorithm in its simplest form.
The DPLL technique was intended as an improvement over the classical Davis-Putnam (DP) algorithm from 1960. The DP algorithm was also based on conjunctive normal form, but instead of performing a backtracking search, it eliminated a variable x by separating the clauses into three groups: the clauses A containing x (without negation), the clauses B containing NOT(x), and the clauses R containing neither. Then, the original formula is unsatisfiable if and only if (A OR B) AND R is unsatisfiable. Unfortunately, once this is reduced back to conjunctive normal form, it may become much longer; if this is done several times, the formula can become rapidly very long and unwieldly. DPLL provided an early example of a space-time tradeoff: instead of checking (A OR B) AND R for unsatisfiability, it checks first A AND R, then B AND R; both of these are already in conjunctive normal form, so there’s no formula size blowup. This was even more important on computers of the day with limited storage; computational complexity was not yet born then, so much of their discussion of this matter is informal.
Another technique introduced in DPLL that has since fallen out of favor is pure literal elimination: this says that if a variable is either never negated or always negated, it can always be assigned the value that makes all its literals true. This similarly reduces average search time, but only in cases where this occurs, which are rare in practice.
This simple scheme is amenable to many improvements: if there are no unit clauses, we still have great freedom in choosing which variables to reduce in which order; we can visit the nodes of the search tree in different orders than a simple in-order search, accelerating discovery of a satisfying assignment; we can perform more sophisticated simplification of the formula at each step, shrinking the tree size. All of these can dramatically affect overall running time, and together modern improvements like this have allowed SAT solvers to scale to formulas with millions of clauses. But because the SAT problem is NP-complete, there is no known general algorithm that is efficient on all instances; even the best methods will encounter instances on which they make poor variable choices and end up spending immense amounts of computing time searching a large search tree.
BPLL schemes were effective at the time because 1960s computers had a single CPU and were memory-starved. Many modern speedup efforts central around parallelization of the traversal of the search tree. An example of this is massively parallel SAT solvers based on FPGAs, which replicate a simple circuit many times to traverse part of the search space in parallel with many other simple solvers. As we move out of the age of increasing clock speed and into the age of parallel systems, solvers that can exploit parallelism effectively will become more important.
An interesting question to ask is: could there be an efficient variant of a scheme like the original BP that expands the formula at each step, but requires no backtracking? On modern systems with much greater memory and external storage capacities, this might be a win on certain instances. I haven’t seen any work along these lines.
The author releases all rights to all content herein and grants this work into the public domain, with the exception of works owned by others such as abstracts, quotations, and WordPress theme content.
Tags: dpll, sat, sat solver, satisfiability, theorem proving
Posted in Algorithms and optimization, Formal verification | 2 Comments »