Papers in Computer Science

Discussion of computer science publications

Archive for the ‘Algorithms and optimization’ Category

The Soft Heap: An Approximate Priority Queue with Optimal Error Rate

Posted by dcoetzee on July 6, 2009

Citation: Chazelle, B. The soft heap: an approximate priority queue with optimal error rate. Journal of the ACM 47, 6 (Nov. 2000), 1012-1027. (PDF)

Abstract: A simple variant of a priority queue, called a soft heap, is introduced. The data structure supports the usual operations: insert, delete, meld, and findmin. Its novelty is to beat the logarithmic bound on the complexity of a heap in a comparison-based model. To break this information-theoretic barrier, the entropy of the data structure is reduced by artificially raising the values of certain keys. Given any mixed sequence of n operations, a soft heap with error rate ε (for any 0 < ε ≤ 1/2) ensures that, at any time, at most εn of its items have their keys raised. The amortized complexity of each operation is constant, except for insert, which takes O(log 1/ε) time. The soft heap is optimal for any value of ε in a comparison-based model. The data structure is purely pointer-based. No arrays are used and no numeric assumptions are made on the keys. The main idea behind the soft heap is to move items across the data structure not individually, as is customary, but in groups, in a data-structuring equivalent of “car pooling.” Keys must be raised as a result, in order to preserve the heap ordering of the data structure. The soft heap can be used to compute exact or approximate medians and percentiles optimally. It is also useful for approximate sorting and for computing minimum spanning trees of general graphs.

Discussion: This 2000 paper by Bernard Chazelle introduced a data structure called a soft heap, which is a heap data structure that can perform any operation in amortized constant time. In exchange for this ideal performance bound, a new limitation is imposed: at any time, any element in the heap may have its key increased at the discretion of the data structure; such an element is said to be corrupted. There is an upper limit of εn corrupted elements, where n is the number of operations completed so far and ε is a fixed parameter. For example, if ε = 0.01, and we insert 1000 elements into the heap, at most 10 elements will be corrupted.

Heaps are one of the fundamental data structures studied in computer science. Like dictionaries or hash tables, they store a set of elements, each with an associated key. Heaps support four fundamental operations:

  • insert: Insert a new element, given the element and its key.
  • extract minimum: Get the element with minimum key and delete it from the heap.
  • decrease key: Update an existing element’s key to a smaller value.
  • merge or meld: Combine two existing heaps together into one large heap.

The most obvious application of these operations is a priority queue: if the keys indicate priority (with smaller values indicating higher priority), then insert adds a new task to the queue, and extract minimum retrieves the current highest-priority task. Heaps also form the basis of the heapsort sorting algorithm, which in its simplest form inserts each element into a heap one-by-one and then uses extract minimum repeatedly to get the result list in order.

The sorting application demonstrates an important lower bound regarding heaps: it is impossible to design a heap data structure that can do both the insert and extract minimum operations in amortized constant time, for if this were possible we would be able to do a comparison sort in linear (O(n)) time; but any such sort requires Ω(n log n) time. Indeed, the best known heap data structure, the Fibonacci heap, developed in 1984, can do all these operations in amortized constant time except for extract minimum, which requires Θ(log n) amortized time.

But what if there were some way to modify the heap data structure that let us break through this lower bound and do all operations in amortized constant time without contradicting the above result? This is exactly what the soft heap does. A soft heap is like a normal heap, but with the troubling property that the key you assign to an element when you insert it may not stay that way; more formally, the data structure may choose at its discretion to increase the key of any element at any time. Any element that has had its key increased is called corrupted. It’s easy to see how this might be helpful in reducing complexity; if the data structure just decided to increase the keys of all elements to infinity, it would be trivial to do all operations in constant time.

But to make soft heaps actually useful and not just fast, we need an additional constraint. This constraint is as follows: each soft heap has a fixed parameter ε called the error rate. Roughly speaking, this controls what proportion of the elements may be corrupted at any given time. More formally, it specifies that if we start with the empty heap and do n operations, the resulting heap will have at most εn corrupted elements. If all n operations are inserts, then indeed ε will be an upper bound on the proportion of corrupted elements. If extract minimum operations are included, the relationship is no longer so clear – it’s possible all elements may become corrupt.

As Chazelle so elegantly says, “despite this apparent weakness, the soft heap is optimal and—perhaps even more surprising—useful.” Perhaps the simplest nontrivial application of soft heaps is finding the median of a list of values in linear time. There is a classical algorithm for doing this, but the implementation and analysis of the algorithm based on the soft heap is much simpler. All we have to do is set ε = 1/3, then insert all n elements into the heap, and then do extract minimum n/3 times. Depending on exactly which elements got corrupted, the final element extracted, the pivot, will be somewhere between the (n/3)th and (2n/3)th smallest element in the list. We then partition the list into the sublist of elements less than or equal to the pivot, and those greater than the pivot, and recursively invoke this procedure on the one containing the median. This is similar to the quicksort algorithm, but only recursively processing one of the sublists instead of both, and it runs in worst-case linear time.

Historically, soft heaps were invented with a specific application in mind: computing minimum spanning trees. Chazelle has achieved some fame for discovering the asymptotically best known algorithms for several problems, and this is one of them: his soft-heap based algorithm can compute minimum spanning trees in O(n α(n)) time, where α is the inverse Ackermann function, a very slowly-growing function that is less than 5 for all remotely practical values of n. I don’t discuss the details of this algorithm here because they’re pretty complex, but you can see Chazelle’s paper (Bernard Chazelle. A Minimum Spanning Tree Algorithm with Inverse-Ackermann Type Complexity. JACM 47(6):1028–1047, 2000, PDF). In 2002, Pettie and Ramachandran presented a minimum spanning tree algorithm based on soft heaps that is known to be asymptotically optimal.

So how does it work? In Chazelle’s original implementation, the soft heap was a simple variant on the binomial heap. Instead of storing a single element at each node, it would store a list of elements, associated with a single key that is an upper bound on the original keys of all the elements. This is the source of the “corruption” – any values in this list whose original keys are smaller than the common key are corrupted. Such “multivalued” nodes could only appear near the top levels of the trees making up the heap, which limits how many corrupted elements there can be. These lists are formed during the soft heap’s special “sift-up” routine, which under certain conditions will run twice on the same node, causing it to merge (concatenate) element lists with one of its children. There are a number of subtleties to this process, however, and to explicate them Chazelle gives complete C source code for the data structure in a literate programming style.

Recently a simpler implementation of soft heaps by Kaplan and Zwick has appeared (A simpler implementation and analysis of Chazelle’s soft heaps. In Proceedings of the Nineteenth Annual ACM -SIAM Symposium on Discrete Algorithms (Jan 4-6, 2009).) They use the original soft heaps concept of storing a list of items at each node, and of occasionally running sift on elements twice during sifting, but rather than being based on binomial heaps, this implementation is based on a set of heap-ordered binary trees. Also, their condition for running sift twice on a node is somewhat less arbitrary: each node x has an associated “size” value that increases exponentially with rank, and the size |list(x)| of the element list at any internal node x must always satisfy size(x) ≤ |list(x)| ≤ 3size(x).

How do you know if soft heaps might be useful for something you’re doing? Due to their peculiar behavior, they tend to be most useful for inventing new algorithms with better worst-case performance than existing algorithms, particularly in situations where the “approximate rank” – or approximate position in sorted order – of an element in a list is useful. It tends to be applied in places where ordinary heap data structures have traditionally been applied in the past, like graph algorithms and sorting. Beyond this though, there may still remain a rich set of untapped applications of soft heaps to be found that employ them in novel ways.

The author releases all rights to all content herein and grants this work into the public domain, with the exception of works owned by others such as abstracts, quotations, and WordPress theme content.

Posted in Algorithms and optimization | Leave a Comment »

Optimization by simulated annealing

Posted by dcoetzee on March 19, 2009

Citation: S. Kirkpatrick, C. D. Gelatt Jr., and M. P. Vecchi. Optimization by Simulated Annealing. Science, May 1983, Vol. 220, no. 4598, pp. 671-680. (PDF).

Abstract: There is a deep and useful connection between statistical mechanics (the behavior of systems with many degrees of freedom in thermal equilibrium at a finite temperature) and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters). A detailed analogy with annealing in solids provides a framework for optimization of the properties of very large and complex systems. This connection to statistical mechanics exposes new information and provides an unfamiliar perspective on traditional optimization problems and methods.

Discussion: This 1983 paper introduced the heuristic optimization technique of simulated annealing, inspired by physical simulation algorithms in statistical mechanics, and applied it to problems of hardware design and the traveling salesman problem.

Annealing is a centuries-old technique in metallurgy, once practiced by blacksmiths and now widely used in industry. It aims to regularize the crystal microstructure of the material, removing defects, which softens it and makes it easier to work. The Gibbs free energy of the material, a measure of the potential energy contained in its structure, is effectively minimized by this procedure.

To perform annealing, a metal is heated to its annealing temperature, slightly above its freezing point, and kept at that temperature for a while. Then it is allowed to slowly cool until it recrystallizes. The system can be seen as a massively parallel solver for minimizing the potential energy: at high temperature, molecules travel rapidly around the substance in search of a global minimum, but will fail to converge on a local minimum. As the substance cools, they can no longer travel as far and they begin to seek and converge on the nearest local energy minimum.

From a more algorithmic point of view, this can be modelled as a random walk on a search graph. Each node represents a solution or state and has a certain fitness value, and the nodes adjacent to each node represent other similar solutions (in the sense that their fitness value is not expected to be dramatically different). At each step the algorithm selects an adjacent node at random. It will transition to the new state if it is a better solution; it may also transition to the new state if it a worse solution, with a probability determined by the global “temperature” parameter. This behavior allows the search to escape local minimums. As the temperature drops, this becomes less likely and the search drops into a nearby local minimum.

A simple example of this is the travelling salesman problem: given a set of cities and distances between each pair of cities, find a minimum-length tour that passes through each city once and returns to the starting point. Finding the minimum-length tour is an NP-complete problem and so not likely to admit an efficient solution any time soon, but simulated annealing can effectively find a solution close to the minimum. In this case a node in the search graph represents a particular tour of the cities, and the neighbors of a node are found by reversing two adjacent cities on the path. Solution quality is determined by total tour length.

Prior to simulated annealing, the most prominent heuristic methods were based on random-restart hill climbing, where we choose a number of random starting states and then make small changes to the solution as long as those small changes improve the fitness. From the physical perspective, this is the equivalent of taking, say, a set of 100 swords, heating them each to the annealing point, and then rapidly quenching them in water and picking the one that turns out softest. It’s ineffective at locating global minimums in large problems with many parameters.

The specific parameters used in simulated annealing, for example determining the probability with which a transition to a higher-energy state should be made, were borrowed from statistical mechanics and in particular the Metropolis–Hastings algorithm for simulation of microscopic systems.

Most of the applications demonstrated in the original simulated annealing paper were not simple classical ones like TSP but rather practical hardware design problems, largely because the authors had background in this area. This proceeds in stages: first the circuits are partitioned into groups small enough to fit on chips, then the locations of the chips are selected (placement), then routing of wires between the chips is done. The main parameter being minimized is the number of signals crossing between each group of circuits.

Unfortunately, simulated annealing is also a notoriously difficult technique to analyze and requires expertise to set up – its performance depends on how the probabilities are calculated, the rate at which the temperature is lowered, the allowed transitions between states, and so on, each of which generally must be empirically determined for a specific problem.

It’s also more applicable to some types of problems than others: for example, it tends to do well for problems for which many near-optimal solutions exist and are a relatively small number of steps away from the starting random solution. Physical substances tend to have this property: it’s extremely unlikely that they would enter a perfect lowest energy configuration, but with high probability annealing will bring them to a configuration close to the lowest possible energy.

Simulated annealing also tends to perform poorly for problems where the fitness function has many “deep wells,” very low local minimums that are difficult to escape from even at high temperature. Substances like glass exhibit this property in real life: even when cooled slowly, they will tend to fail to arrange themselves in a regular crystal structure, preferring the amorphous structure of glass.

Since the original work a number of improvements have been proposed to the method. Adaptive simulated annealing (Ingber, 1989) attempts to avoid the design problem of choosing parameters like the cooling schedule by automatically choosing these as the algorithm progresses, based on how the search is behaving. The methods of quantum annealing (Apolloni et al, 1989) and stochastic tunneling (Wenzel and Hamacher, 1999) seek to escape deep wells by occasionally “jumping” across nearby energy barriers. Other stochastic optimization techniques, such as genetic algorithms and ant colony optimization, generate new solutions in ways other than local modification.

An interesting question to ask as we enter the age of parallel computing is: if we reconsider the original motivating physical problem for simulated annealing, a group of particles independently seeking minima and responding to the global temperature, could a parallel method be deduced from this whereby a large number of small circuits or processors each simulate the behavior of a single component as the global temperature decreases? I haven’t yet seen any work in this area.

The author releases all rights to all content herein and grants this work into the public domain, with the exception of works owned by others such as abstracts, quotations, and WordPress theme content.

Posted in Algorithms and optimization | Tagged: , , , | 1 Comment »

A Machine Program for Theorem-Proving

Posted by dcoetzee on March 14, 2009

Citation: Davis, M., Logemann, G., and Loveland, D. A machine program for theorem-proving. Communications of the ACM 5, 7 (Jul. 1962), 394-397. (PDF)

Abstract: The programming of a proof procedure is discussed in connection with trial runs and possible improvements.

Discussion: This short 1962 paper introduced the Davis-Logemann-Loveland (DLL) or Davis-Putnam-Logemann-Loveland (DPLL) algorithm, the basic algorithm for solving the boolean satisfiability (SAT) problem; it remains the foundation of all the most efficient SAT solvers used today. The paper shows it age with obsolete terminology and special attention to implementation details, making it difficult to read in some places, but the algorithm itself is straightforward to describe in modern terms.

In the boolean satisfiability problem, we are given a formula made up of three operations, AND, OR, and NOT, and a set of boolean variables, each of which may take on either the value true or false. The goal is to determine whether there is a particular assignment of true-false values to the variables that will make the formula as a whole true. For example, given this formula:

(x1 OR NOT(x2)) AND (NOT(x1) OR x2),

a satisfying assignment would be x1=true,x2=true. If a formula has no satisfying assignments, it is said to be unsatisfiable; the name of this paper comes from the useful fact that many simple theorems can be proven to be true by showing that a particular boolean formula is unsatisfiable.

The most naive algorithm for solving the problem is exhaustive search, trying every possible combination of truth values; but if there are n variables, there are 2n of these.

Another simple but more effective method is backtracking search: choose a value for one variable, plug it in the formula, and then simplify the formula. There are many ways to simplify the formula, but one obvious way is using boolean identities such as NOT true = false, NOT false = true, (x AND false) = false, and (x OR true) = true. Then choose a value for another variable and continue in this manner until the formula reduces to either true or false. If it reduces to true, a satisfying assignment has been found; otherwise, we backtrack to the last variable choice (that we have not already tried both values for) and switch it to the other value. A conceptual binary tree is constructed, with a variable at each node, and each edge representing an assignment of a value to a variable; this tree is called the search tree, and the search effectively performs an in-order traversal of it. The leaves are either true or false. If the formula is unsatisfiable, eventually it will walk the entire tree without encountering a true leaf.

The trouble with backtracking search is that if the variables are chosen in a poor order, the formula won’t reduce very much, the search tree will be large, and the runtime will still be similar to exhaustive search. To deal with this, the early Davis-Putnam (DP) method took a different approach: they observed that any boolean formula could be reduced to a canonical form called conjunctive normal form (CNF). In this form, the formula is written as an

  • A literal is defined as either a variable or its negative (NOT(x)).
  • A clause is defined as an OR (disjunction) over one or more literals.
  • The formula is written as an AND (conjunction) of clauses.

The example formula above was in this form. This form is convenient because it’s expressive enough to express many interesting constraints, but also simple enough to build more efficient algorithms for. The original techniques used by the authors to convert formulas to conjunctive normal form actually do not scale in some cases, but later it would be shown that there is a way to efficiently reduce any boolean formula to a formula in conjunctive normal form that isn’t too much larger.

Once the formula is in this form, we can improve backtracking search with an important heuristic called unit propagation: the idea is to search the formula for clauses containing only a single literal (either x or NOT(x)); these are called unit clauses. Because this literal must be true to satisfy the clause, and every clause must be true to satisfy the formula, the value of that variable is determined, so we can assign that variable a single value and simplify the formula. Unit clauses rarely appear in the original formula, but often pop up as we begin to assign some of the variables and simplify it. The effect compared to ordinary backtracking search is that the search tree has some nodes with only one child, reducing its average degree and the overall search time. This is essentially the DPLL algorithm in its simplest form.

The DPLL technique was intended as an improvement over the classical Davis-Putnam (DP) algorithm from 1960. The DP algorithm was also based on conjunctive normal form, but instead of performing a backtracking search, it eliminated a variable x by separating the clauses into three groups: the clauses A containing x (without negation), the clauses B containing NOT(x), and the clauses R containing neither. Then, the original formula is unsatisfiable if and only if (A OR B) AND R is unsatisfiable. Unfortunately, once this is reduced back to conjunctive normal form, it may become much longer; if this is done several times, the formula can become rapidly very long and unwieldly. DPLL provided an early example of a space-time tradeoff: instead of checking (A OR B) AND R for unsatisfiability, it checks first A AND R, then B AND R; both of these are already in conjunctive normal form, so there’s no formula size blowup. This was even more important on computers of the day with limited storage; computational complexity was not yet born then, so much of their discussion of this matter is informal.

Another technique introduced in DPLL that has since fallen out of favor is pure literal elimination: this says that if a variable is either never negated or always negated, it can always be assigned the value that makes all its literals true. This similarly reduces average search time, but only in cases where this occurs, which are rare in practice.

This simple scheme is amenable to many improvements: if there are no unit clauses, we still have great freedom in choosing which variables to reduce in which order; we can visit the nodes of the search tree in different orders than a simple in-order search, accelerating discovery of a satisfying assignment; we can perform more sophisticated simplification of the formula at each step, shrinking the tree size. All of these can dramatically affect overall running time, and together modern improvements like this have allowed SAT solvers to scale to formulas with millions of clauses. But because the SAT problem is NP-complete, there is no known general algorithm that is efficient on all instances; even the best methods will encounter instances on which they make poor variable choices and end up spending immense amounts of computing time searching a large search tree.

BPLL schemes were effective at the time because 1960s computers had a single CPU and were memory-starved. Many modern speedup efforts central around parallelization of the traversal of the search tree. An example of this is massively parallel SAT solvers based on FPGAs, which replicate a simple circuit many times to traverse part of the search space in parallel with many other simple solvers. As we move out of the age of increasing clock speed and into the age of parallel systems, solvers that can exploit parallelism effectively will become more important.

An interesting question to ask is: could there be an efficient variant of a scheme like the original BP that expands the formula at each step, but requires no backtracking? On modern systems with much greater memory and external storage capacities, this might be a win on certain instances. I haven’t seen any work along these lines.

The author releases all rights to all content herein and grants this work into the public domain, with the exception of works owned by others such as abstracts, quotations, and WordPress theme content.

Posted in Algorithms and optimization, Formal verification | Tagged: , , , , | 1 Comment »