</div>
<div class="narrow">
- <h1><a name="A case study: Design and implementation of a simple Twitter clone using only the Redis key-value store as database and PHP">A case study: Design and implementation of a simple Twitter clone using only the Redis key-value store as database and PHP</a></h1>In this article I'll explain the design and the implementation of a <a href="http://retwis.antirez.com" target="_blank">simple clone of Twitter</a> written using PHP and <a href="http://code.google.com/p/redis/" target="_blank">Redis</a> as only database. The programming community uses to look at key-value stores like special databases that can't be used as drop in replacement for a relational database for the development of web applications. This article will try to prove the contrary.<br/><br/>Our Twitter clone, <a href="http://retwis.antirez.com" target="_blank">called Retwis</a>, is structurally simple, has very good performances, and can be distributed among N web servers and M Redis servers with very little efforts. You can find the source code <a href="http://code.google.com/p/redis/downloads/list" target="_blank">here</a>.<br/><br/>We use PHP for the example since it can be read by everybody. The same (or... much better) results can be obtained using Ruby, Python, Erlang, and so on.
-<h1><a name="Key-value stores basics">Key-value stores basics</a></h1>
+ <h1><a name="A case study: Design and implementation of a simple Twitter clone using only the Redis key-value store as database and PHP">A case study: Design and implementation of a simple Twitter clone using only the Redis key-value store as database and PHP</a></h1>In this article I'll explain the design and the implementation of a <a href="http://retwis.antirez.com" target="_blank">simple clone of Twitter</a> written using PHP and <a href="http://code.google.com/p/redis/" target="_blank">Redis</a> as only database. The programming community uses to look at key-value stores like special databases that can't be used as drop in replacement for a relational database for the development of web applications. This article will try to prove the contrary.<br/><br/>Our Twitter clone, <a href="http://retwis.antirez.com" target="_blank">called Retwis</a>, is structurally simple, has very good performances, and can be distributed among N web servers and M Redis servers with very little efforts. You can find the source code <a href="http://code.google.com/p/redis/downloads/list" target="_blank">here</a>.<br/><br/>We use PHP for the example since it can be read by everybody. The same (or... much better) results can be obtained using Ruby, Python, Erlang, and so on.<br/><br/><b>News! <a href="http://retwisrb.danlucraft.com/" target="_blank">Retwis-rb</a> is a port of Retwis to Ruby and Sinatra written by Daniel Lucraft!</b> With full source code included of course, the git repository is linked at the end of the Retwis-RB page. The rest of this article targets PHP, but Ruby programmers can also check the other source code, it conceptually very similar.<h1><a name="Key-value stores basics">Key-value stores basics</a></h1>
The essence of a key-value store is the ability to store some data, called <i>value</i>, inside a key. This data can later be retrieved only if we know the exact key used to store it. There is no way to search something by value. So for example I can use the command SET to store the value <b>bar</b> at key <b>foo</b>:<br/><br/><pre class="codeblock python" name="code">
SET foo bar
</pre>Redis will store our data permanently, so we can later ask for "<i>What is the value stored at key foo?</i>" and Redis will reply with <b>bar</b>:<br/><br/><pre class="codeblock python python" name="code">
The first thing to do is to hash the key and issue the request on different servers based on the key hash. There are a lot of well known algorithms to do so, for example check the Redis Ruby library client that implements <i>consistent hashing</i>, but the general idea is that you can turn your key into a number, and than take the reminder of the division of this number by the number of servers you have:<br/><br/><pre class="codeblock python python python python python python python python python python python python python python python python python python python python python python python python" name="code">
server_id = crc32(key) % number_of_servers
</pre>This has a lot of problems since if you add one server you need to move too much keys and so on, but this is the general idea even if you use a better hashing scheme like consistent hashing.<br/><br/>Ok, are key accesses distributed among the key space? Well, all the user data will be partitioned among different servers. There are no inter-keys operations used (like SINTER, otherwise you need to care that things you want to intersect will end in the same server. <b>This is why Redis unlike memcached does not force a specific hashing scheme, it's application specific</b>). Btw there are keys that are accessed more frequently.<h3><a name="Special keys">Special keys</a></h3>For example every time we post a new message, we <b>need</b> to increment the <code name="code" class="python">global:nextPostId</code> key. How to fix this problem? A Single server will get a lot if increments. The simplest way to handle this is to have a dedicated server just for increments. This is probably an overkill btw unless you have really a lot of traffic. There is another trick. The ID does not really need to be an incremental number, but just <b>it needs to be unique</b>. So you can get a random string long enough to be unlikely (almost impossible, if it's md5-size) to collide, and you are done. We successfully eliminated our main problem to make it really horizontally scalable!<br/><br/>There is another one: global:timeline. There is no fix for this, if you need to take something in order you can split among different servers and <b>then merge</b> when you need to get the data back, or take it ordered and use a single key. Again if you really have so much posts per second, you can use a single server just for this. Remember that with commodity hardware Redis is able to handle 100000 writes for second, that's enough even for Twitter, I guess.<br/><br/>Please feel free to use the comments below for questions and feedbacks.
-
</div>
</div>