The first thing to do is to hash the key and issue the request on different servers based on the key hash. There are a lot of well known algorithms to do so, for example check the Redis Ruby library client that implements <i>consistent hashing</i>, but the general idea is that you can turn your key into a number, and than take the reminder of the division of this number by the number of servers you have:<br/><br/><pre class="codeblock python python python python python python python python python python python python python python python python python python python python python python python python" name="code">
server_id = crc32(key) % number_of_servers
</pre>This has a lot of problems since if you add one server you need to move too much keys and so on, but this is the general idea even if you use a better hashing scheme like consistent hashing.<br/><br/>Ok, are key accesses distributed among the key space? Well, all the user data will be partitioned among different servers. There are no inter-keys operations used (like SINTER, otherwise you need to care that things you want to intersect will end in the same server. <b>This is why Redis unlike memcached does not force a specific hashing scheme, it's application specific</b>). Btw there are keys that are accessed more frequently.<h3><a name="Special keys">Special keys</a></h3>For example every time we post a new message, we <b>need</b> to increment the <code name="code" class="python">global:nextPostId</code> key. How to fix this problem? A Single server will get a lot if increments. The simplest way to handle this is to have a dedicated server just for increments. This is probably an overkill btw unless you have really a lot of traffic. There is another trick. The ID does not really need to be an incremental number, but just <b>it needs to be unique</b>. So you can get a random string long enough to be unlikely (almost impossible, if it's md5-size) to collide, and you are done. We successfully eliminated our main problem to make it really horizontally scalable!<br/><br/>There is another one: global:timeline. There is no fix for this, if you need to take something in order you can split among different servers and <b>then merge</b> when you need to get the data back, or take it ordered and use a single key. Again if you really have so much posts per second, you can use a single server just for this. Remember that with commodity hardware Redis is able to handle 100000 writes for second, that's enough even for Twitter, I guess.<br/><br/>Please feel free to use the comments below for questions and feedbacks.
The first thing to do is to hash the key and issue the request on different servers based on the key hash. There are a lot of well known algorithms to do so, for example check the Redis Ruby library client that implements <i>consistent hashing</i>, but the general idea is that you can turn your key into a number, and than take the reminder of the division of this number by the number of servers you have:<br/><br/><pre class="codeblock python python python python python python python python python python python python python python python python python python python python python python python python" name="code">
server_id = crc32(key) % number_of_servers
</pre>This has a lot of problems since if you add one server you need to move too much keys and so on, but this is the general idea even if you use a better hashing scheme like consistent hashing.<br/><br/>Ok, are key accesses distributed among the key space? Well, all the user data will be partitioned among different servers. There are no inter-keys operations used (like SINTER, otherwise you need to care that things you want to intersect will end in the same server. <b>This is why Redis unlike memcached does not force a specific hashing scheme, it's application specific</b>). Btw there are keys that are accessed more frequently.<h3><a name="Special keys">Special keys</a></h3>For example every time we post a new message, we <b>need</b> to increment the <code name="code" class="python">global:nextPostId</code> key. How to fix this problem? A Single server will get a lot if increments. The simplest way to handle this is to have a dedicated server just for increments. This is probably an overkill btw unless you have really a lot of traffic. There is another trick. The ID does not really need to be an incremental number, but just <b>it needs to be unique</b>. So you can get a random string long enough to be unlikely (almost impossible, if it's md5-size) to collide, and you are done. We successfully eliminated our main problem to make it really horizontally scalable!<br/><br/>There is another one: global:timeline. There is no fix for this, if you need to take something in order you can split among different servers and <b>then merge</b> when you need to get the data back, or take it ordered and use a single key. Again if you really have so much posts per second, you can use a single server just for this. Remember that with commodity hardware Redis is able to handle 100000 writes for second, that's enough even for Twitter, I guess.<br/><br/>Please feel free to use the comments below for questions and feedbacks.