]> git.saurik.com Git - redis.git/blame_incremental - design-documents/REDIS-CLUSTER
use a struct to store both a dict and its weight for ZUNION and ZINTER, so qsort...
[redis.git] / design-documents / REDIS-CLUSTER
... / ...
CommitLineData
1Redis Cluster Design Proposal (work in progress)
2
3Network layout
4==============
5
6 - N different Data Nodes. Every node is identified by ip:port.
7 - A single Configuration Node.
8 - M different Proxy Nodes (redis-cluster).
9 - A single Handling Node.
10
11Configuration Node
12==================
13
14 - Contains information about all the Data nodes in the cluster.
15 - Contains information about all the Proxy nodes in the cluster.
16 - Maps the keyspace to different nodes.
17
18The keyspace is divided into 1024 different "hashing slots".
19
20Given a key perform SHA1(key) and use the last 10 bits of the result to get a 10 bit number representing the key slot (from 0 to 1023).
21
22The Configuration node maps every slot of the keyspace to K different Data Nodes.
23
24The Configuration node can be modified by a single client at a time. Locking is performed using SETNX.
25
26The Configuration node should be replicated as there is a single configuration node for the whole network.
27
28The Configuration node is a standard Redis server, like every other Data node.
29
30Data Nodes
31==========
32
33Data nodes just hold data, and are normal Redis processes. There is no configuration stored on nodes, nor the nodes are "active" in the cluster, they just receive normal Redis commands.
34
35Proxy Nodes
36===========
37
38Proxy nodes get requests from clients and route this requests to the right Redis nodes.
39
40When a proxy node is started it needs to know the Configuration node address in order to load the infomration about the Data nodes and the mapping between the key space and the nodes.
41
42On startup a Proxy node will also register itself in the Configuration node, and will make sure to refresh it's configuration every N seconds (via an EXPIREing key) so that it's possible to detect when a Proxy node fails.
43
44The Proxy node also is in charge of signaling failing Data nodes to the Configuration node, so that the Handling node can take appropriate actions.
45
46When a new Data node joins or leaves the cluster, and in general when the cluster configuration changes, all the Proxy nodes will receive a notification and will reload the configuration from the Configuration node.
47
48Handling Node
49=============
50
51The handling node is a special Redis client with the following role:
52
53 - Handles the cluster configuration stored in the Config node.
54 - Is in charge for adding and removing nodes dynamically from the net.
55 - Relocates keys on nodes additions / removal.
56 - Signal a configuration change to Proxy nodes.
57
58More details on hashing slots
59============================
60
61The Configuration node holds 1024 keys in the following form:
62
63 hashingslot:0
64 hashingslot:1
65 ...
66 hashingslot:1023
67
68Every hashing slot is actually a Redis list, containing a single or more ip:port pairs. For instance:
69
70 hashingslot:10 => 192.168.1.19:6379, 192.168.1.200:6379
71
72This mean that keys hashing to slot 10 will be saved in the two Data nodes 192.168.1.19:6379 and 192.168.1.200:6379.
73
74When a client performs a read operation (via a proxy node), the proxy will contact a random Data node among the data nodes in charge for the given slot.
75
76For instance a client can ask for the following operation to a given Proxy node:
77
78 GET mykey
79
80"mykey" hashes to (for instance) slot 10, so the Proxy will forward the request to either Data node 192.168.1.19:6379 or 192.168.1.200:6379, and then forward back the reply to the client.
81
82When a write operation is performed, it is forwarded to both the Data nodes in the example (and in general to all the data nodes).
83
84Adding or removing a node
85=========================
86
87When a Data node is added to the cluster, it is added via an LPUSH operation into a Redis list representing a queue of Data nodes that are ready to enter the cluster. This list is hold by the Configuration node of course, and can be added manually or via a configuration utility.
88
89 LPUSH newnodes 192.168.1.55:6379
90
91The Handling node will check from time to time for this new elements in the "newode" list. If there are new nodes pending to enter the cluster, they are processed one after the other in this way:
92
93For instance let's assume there are already two Data nodes in the cluster:
94
95 192.168.1.1:6379
96 192.168.1.2:6379
97
98We add a new node 192.168.1.3:6379 via the LPUSH operation.
99
100We can imagine that the 1024 hash slots are assigned equally among the two inital nodes. In order to add the new (third) node what we have to do is to move incrementally 341 slots form the two old servers to the new one.
101
102For now we can think that every hash slot is only stored in a single server, to generalize the idea later.
103
104In order to simplify the implementation every slot can be moved from one Data node to another one in a blocking way, that is, read operations will continue to all the 1024 slots, but a single slot at a time will delay write operations until the moving from one Data node to another is completed.
105
106In order to do so the Handler node, before to move a given node, marks it as "write-locked" in the Configuration server, than asks all the Proxy nodes to refresh the configuration.
107
108Then the slot is moved (1/1024 of all the keys). The Configuration server is modified to reflect the new hashing slots configuration, the slot is unlocked, the Proxy nodes notified.
109
110Implementation details
111======================
112
113Every Proxy node should take persistent connections to all the Data nodes.
114
115To run the Handling node and the Configuration node in the same physical computer is probably a good idea.