Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications
Posted by dcoetzee on 4th March 2009
Citation: Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the 2001 Conference on Applications, Technologies, Architectures, and Protocols For Computer Communications (San Diego, California, United States). SIGCOMM ’01. ACM, New York, NY, 149-160. (PDF).
Abstract: A fundamental problem that confronts peer-to-peer applications is to efficiently locate the node that stores a particular data item. This paper presents Chord, a distributed lookup protocol that addresses this problem. Chord provides support for just one operation: given a key, it maps the key onto a node. Data location can be easily implemented on top of Chord by associating a key with each data item, and storing the key/data item pair at the node to which the key maps. Chord adapts efficiently as nodes join and leave the system, and can answer queries even if the system is continuously changing. Results from theoretical analysis, simulations, and experiments show that Chord is scalable, with communication cost and the state maintained by each node scaling logarithmically with the number of Chord nodes.
Discussion: This 2001 paper introduced Chord, one of the first published distributed hash tables and one of the most popular ones in practice due to its simplicity and good practical performance.
Distributed hash tables are data structures that provide a “lookup” function for a distributed group of computers: given the key for a piece of data, find the computer where that piece of data is stored. This is easy to do with a large centralized server that keeps track of where each piece of data is stored, but the problem becomes more complicated when we want to implement one on a peer-to-peer network where all nodes are running the same code and none is willing to commit the resources to track the entire network. Chord is one of the earliest and simplest data structures of this sort. In a system with n machines, it is able to perform a lookup by contacting only O(log N) other nodes, and only has to store data about O(log N) other nodes; no node has complete knowledge of the network.
The first task a distributed hash table must do is decide which nodes store which data items. Chord relies on the use of a hash function, typically a cryptographic hash function such as SHA1, to assign a unique integer identifier to each node (the hash of its IP address) and to each key that may be looked up. Then, a key k is assigned to a node n if and only if hash(k) ≤ hash(n) and there is no other node with a hash between hash(k) and hash(n). If there are no nodes with hashes greater than hash(k), it is assigned to the node with smallest hash. This can be visualized by imagining all the hash values laid out on a large circle or ring, increasing in the clockwise direction; to find the node for a key, you go to the key’s hash on the circle and proceed clockwise until you encounter a node’s hash. This system is called consistent hashing, and was conceived by D. Lewin et al in 1997 (Karger, D., Lehman, E., Leighton, F., Levine, M., Lewin, D., and Panigrahy, R. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the World Wide Web. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing, May 1997, pp. 654-663. PDF). It’s convenient for distributed hash tables because if a node joins or leaves the system, only a small number of data items have to be reassigned (those between it and the previous node), and only two nodes have to communicate to reassign them; the most obvious alternative, a system that assigns keys at random, would need to contact all nodes to locate the data to be reassigned.
The most trivial distributed hash table in this arrangement would be for each node to keep the IP address of the next node in the clockwise direction on the circle, called its successor; this connects them all in a circularly linked list. A node can follow these links by asking each node in turn for its successor, until the one storing the desired key is reached. This works fine, but is very slow: it requires roundtrip communication with a series of N/2 nodes on average, and N-1 nodes in the worst case.
To speed this up, Chord introduces the concept of fingers: the ith finger of a node n is the first node in the clockwise direction from the hash value hash(n) + 2i-1. Every node stores a table of fingers and the IP addresses of the corresponding nodes. To locate the node for a key, a node contacts the largest-indexed finger on its table that precedes the key on the circle, and forwards the request to that node. This repeats until the request reaches the node carrying the data. The request only needs to be forwarded at most log N times, because each node stores the most information about nodes closely following it on the circle, and the closer we get the more precise information we have.
When a node joins or leaves the network, it needs to set up its finger tables, and update the finger tables of other nodes to include it. Using the above lookup primitive, it is easy to locate each of the necessary nodes in log N time, so that nodes can join or leave the network in O(log2 N) time.
Part of why Chord is useful in peer-to-peer systems is that it’s resilient to nodes continuously joining and leaving the network: as long as the successor links remain correct, lookup remains correct; finger tables are just an optimization and they can be updated more infrequently without significantly impacting performance. A stabilization procedure is responsible for ensuring that successors remain correct and nodes have a consistent view of successors. Similarly, node failures, where a node drops from the network without warning, can be dealt with by adding a list of the next log N successors to each node; if the successor ever fails, the request can be forwarded to the next live successor. It’s unlikely all log N successors will fail simultaneously before the stabilization procedure can update the successor list. To prevent data loss in the case of node failure, a replication protocol can be built on top of Chord which simply stores the same data under two or more distinct keys.
In line with its goal as a practical system, a large section (section 6) was dedicated to experimental results. Most of these were based on simulations with large numbers of nodes (say 10000) to demonstrate scalability and stability. They show that the average number of nodes contacted for lookups is actually close to (log N)/2; and they give an important optimization, which is for each physical node to run a number of “virtual” nodes with the same IP address; this helps to divide the hash space more evenly and decrease the variance in the number of keys per physical node. They also ran an experiment on the Internet with a smaller network of 10 machines across the United States, showing a total latency of about 200 ms for lookups.
As is frequently the case with scientific discoveries, a number of researchers independently invented systems with similar capabilities to Chord at about the same time in 2001. These systems included Pastry and Tapestry, which both emphasize minimizing latency over simplicity, and CAN (content addressable network), which organizes its nodes in a d-dimensional Cartesian coordinate space instead of a ring, and requires more parameter tuning than Chord. Since then a number of other distributed hash tables technologies have been developed, such as Maymounkov and Mazières‘s Kademlia in 2002 (PDF), which relies on an XOR-based metric topology.
Chord, particularly the popular MIT implementation MIT Chord, has undergone a great deal of performance refinement and further analysis since, including techniques like proximity routing (preferring to route to nodes with nearby nodes), geographic overlay construction (using physical location to inform the structure of the topology), and landmark routing. These all aim to lower the latency of lookup in practice. A Berkeley course project by Li Zhuang and Feng Zhou explained and evaluated a variety of these refinements.
An important limitation of Chord (discussed briefly at the end of section 4.3) is its fundamental assumption that the hash function behaves randomly. Although it uses cryptographic hashes that are collision-resistant, it isn’t difficult to generate IPs or keys that hash to a small segment of the hash space, by simply generating random values until you find one with a hash in that segment, rendering it vulnerable to certain attacks. Other distributed hash tables, such as Pastry, Tapestry, and Symphony, incorporate randomization into their topology.
A more extensive modern discussion of Chord is given in Chapter 2 of Monica Haladyna Braunisch’s 2006 Master’s Thesis at MIT (PDF).
The author releases all rights to all content herein and grants this work into the public domain, with the exception of works owned by others such as abstracts, quotations, and WordPress theme content.
Tags: chord, consistent hashing, distributed hash table, hash table, peer-to-peer
Posted in Distributed systems | 3 Comments »