Papers in Computer Science

Discussion of computer science publications

Archive for the 'Distributed systems' Category

Virtual Time

Posted by dcoetzee on 23rd April 2009

Citation: David R. Jefferson. Virtual time. ACM Transactions on Programming Languages and Systems, 7, 3 (Jul. 1985), 404-425. (PDF)

Abstract: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control. Virtual time provides a flexible abstraction of real time in much the same way that virtual memory provides an abstraction of real memory. It is implemented using the Time Warp mechanism, a synchronization protocol distinguished by its reliance on lookahead-rollback, and by its implementation of rollback via antimessages.

Discussion: This 1985 paper introduced virtual time, a concept that allows a distributed system to be organized around a linear global clock; rather than maintain a synchronized clock, it achieves efficiency by having each node maintain its own local virtual time and performing rollback when a node receives a message “in the past.” Although not widely adopted, it has served as an influential model of a general system with optimistic concurrency and rollback.

To motivate the concept, let’s for a moment consider a magical distributed system with the following properties:

  • The system is a set of nodes, each capable of doing any amount of local processing instantaneously.
  • Each node can choose at any time to send a message to any other node. Moreover, it can specify precisely when that message will be received – the only constraint is that it cannot be received in the past.

Such a system is capable of intuitively describing many different distributed systems. For example, you could have a distributed simulation (such as a physical simulation) in which each message tells a node to simulate a particular kind of event, and the times at which the messages arrive correspond to the times at which the events they simulate occur. Consequently all events are simulated in order. You could ensure that the transactions of a database system are committed in order by assigning a time to them indicating when they’re each supposed to atomically occur. Finally, most simply, you could create a system in which all messages are received in order, by merely setting the received times to be the same as the sending time. (These scenarios are based on section 5 from the paper.)

In real life, of course, we don’t have instantaneous processing, and we can’t control when messages will be received. The concept behind this paper is to get rid of “real time”, and replace it by a new sort of time called virtual time (by analogy with virtual memory) that we have more control over. We can control the rate at which virtual time advances, and in fact time it can advance at different rates at different nodes in the system: each node keeps a local clock. To control the time at which messages are received, we introduce a queue at each node that queues up all received messages and does not process them until their virtual time arrives.

Not having to agree on a global virtual time greatly decreases the communication cost, but the problem with this flexibility is that once the local clocks have gotten out of sync, some strange scenarios can arise: if node A is lagging behind node B, it may send a message to B that arrives “in the past” from B’s perspective. But B may have already taken action based on the fact that it didn’t receive that message, including sending new messages to other nodes.

There are a number of solutions to this problem. The most obvious one is to have each node wait until its clock is the furthest in the past of all nodes before proceeding. This corresponds to pessimistic concurrency, and is the equivalent of having a global lock in a threaded program, with all the same consequences of low concurrency and poor throughput. The one taken by Jefferson is a form of optimistic concurrency: each local clock simply jumps to the received time of the next waiting event, so that the node is never idle, maximizing throughput. If it ever receives an new message “in the past,” it does a rollback to a point in time before that message was received and then proceeds forward again. To facilitate this it takes periodic snapshots of its state, tagged with virtual times. This is all transparent to the node’s program; from its perspective, virtual time never decreases. This is unlike, say, transactions, where a failed transaction needs to be detected and retried.

The big remaining problem is that during rollback there are certain things that can’t be undone locally, most importantly the sending of messages to other nodes. There are, again, multiple ways of dealing with this, but the one used by Jefferson is that whenever a message send is rolled back, an “antimessage” indicating this is sent to the same node with the same received time. At the destination, if the corresponding message is still queued, they collide and eliminate one another. If the corresponding message has already been processed, the antimessage arrives in the past and causes that node in turn to rollback and not receive the original message. One requirement for implementing this is that all nodes keep a queue of messages already processed.

A nontrivial problem is to show that the system as a whole makes progress amidst all these cascading rollbacks. Another serious problem is the performance issue of how to prevent the nodes from running out of memory from storing unbounded queues of messages and state snapshots. To deal with these issues, Jefferson defines the concept of global virtual time (GVT), which is the earliest virtual time at which anything in the system is currently happening. Because no node can send a message into its own past, no message can be received before the GVT, and no node will ever have to rollback to a time before the GVT. Consequently, any state snapshot or already-processed message marked with a virtual time prior to GVT can be discarded, in a process Jefferson calls “fossil collection” (this is oddly prophetic, as the term “garbage collection” was not coined until 1994 by Kaushik Ghosh). Additionally, it can be shown that, even though individual local clocks may roll back, GVT never decreases. As long as no message is lost and no node blocks indefinitely, it will eventually increase, which is enough to guarantee that the system makes progress. Eventually, all message queues will be empty, and the system will terminate.

The main downside to a virtual time system is that it fundamentally relies on the assumption of what the author calls “temporal locality” – that messages won’t arrive in the past too often. Less obviously, it also depends on the efficiency of the implementation of state snapshotting, message queuing, and determining the global virtual time. Because virtual time is such a general framework, applying to distributed systems from multiprocessors to databases to LANs, this empirical analysis depends very much on the specific implementation. Evaluations of a highly-optimized practical implementation on multiprocessors were done in a series of papers by Richard M. Fujimoto (“Time warp on a shared memory multiprocessor” (1989), “The effect of memory capacity on Time Warp performance” (1993), “An adaptive memory management protocol for Time Warp parallel simulation” (1994), “An Empirical Evaluation of Performance-Memory Trade-Offs in Time Warp” (1997)). I’m not aware of any thorough evaluations in other scenarios.

The concept of virtual time was later generalized by Friedemann Mattern in his “Virtual Time and Global States of Distributed Systems,” which instead of using a totally ordered linear virtual time relies on a partially ordered virtual time (Parallel and Distributed Algorithms, 1989, 215-226, PDF). The motivation is to eliminate unnecessary ordering constraints imposed by the total order, at the cost of some conceptual complexity. Jefferson briefly hinted at this possibility (“[v]irtual times may be only partially ordered”).

The author releases all rights to all content herein and grants this work into the public domain, with the exception of works owned by others such as abstracts, quotations, and WordPress theme content.

Posted in Distributed systems | No Comments »