X-Git-Url: https://git.saurik.com/redis.git/blobdiff_plain/81d456450ac42b7cf78367ae3a89af1a543c9ff1..8fb13ce816b85ac414921ecca420671bf74a3eea:/doc/TwitterAlikeExample.html diff --git a/doc/TwitterAlikeExample.html b/doc/TwitterAlikeExample.html new file mode 100644 index 00000000..0c75cc93 --- /dev/null +++ b/doc/TwitterAlikeExample.html @@ -0,0 +1,250 @@ + + + +
+ + + ++SET foo bar +Redis will store our data permanently, so we can later ask for "What is the value stored at key foo?" and Redis will reply with bar:
+GET foo => bar +Other common operations provided by key-value stores are DEL used to delete a given key, and the associated value, SET-if-not-exists (called SETNX on Redis) that sets a key only if it does not already exist, and INCR that is able to atomically increment a number stored at a given key:
+SET foo 10 +INCR foo => 11 +INCR foo => 12 +INCR foo => 13 +
+x = GET foo +x = x + 1 +SET foo x +The problem is that doing the increment this way will work as long as there is only a client working with the value x at a time. See what happens if two computers are accessing this data at the same time:
+x = GET foo (yields 10) +y = GET foo (yields 10) +x = x + 1 (x is now 11) +y = y + 1 (y is now 11) +SET foo x (foo is now 11) +SET foo y (foo is now 11) +Something is wrong with that! We incremented the value two times, but instead to go from 10 to 12 our key holds 11. This is because the INCR operation done with
GET / increment / SET
is not an atomic operation. Instead the INCR provided by Redis, Memcached, ..., are atomic implementations, the server will take care to protect the get-increment-set for all the time needed to complete in order to prevent simultaneous accesses.+LPUSH mylist a (now mylist holds one element list 'a') +LPUSH mylist b (now mylist holds 'b,a') +LPUSH mylist c (now mylist holds 'c,b,a') +LPUSH means Left Push, that is, add an element to the left (or to the head) of the list stored at mylist. If the key mylist does not exist it is automatically created by Redis as an empty list before the PUSH operation. As you can imagine, there is also the RPUSH operation that adds the element on the right of the list (on the tail).
username:updates
for instance. There are operations to get data or information from Lists of course. For instance LRANGE returns a range of the list, or the whole list.+LRANGE mylist 0 1 => c,b +LRANGE uses zero-based indexes, that is the first element is 0, the second 1, and so on. The command aguments are
LRANGE key first-index last-index
. The last index argument can be negative, with a special meaning: -1 is the last element of the list, -2 the penultimate, and so on. So in order to get the whole list we can use:+LRANGE mylist 0 -1 => c,b,a +Other important operations are LLEN that returns the length of the list, and LTRIM that is like LRANGE but instead of returning the specified range trims the list, so it is like Get range from mylist, Set this range as new value but atomic. We will use only this List operations, but make sure to check the Redis documentation to discover all the List operations supported by Redis. +
+SADD myset a +SADD myset b +SADD myset foo +SADD myset bar +SCARD myset => 4 +SMEMBERS myset => bar,a,foo,b +Note that SMEMBERS does not return the elements in the same order we added them, since Sets are unsorted collections of elements. When you want to store the order it is better to use Lists instead. Some more operations against Sets:
+SADD mynewset b +SADD mynewset foo +SADD mynewset hello +SINTER myset mynewset => foo,b +SINTER can return the intersection between Sets but it is not limited to two sets, you may ask for intersection of 4,5 or 10000 Sets. Finally let's check how SISMEMBER works:
+SISMEMBER myset foo => 1 +SISMEMBER myset notamember => 0 +Ok I think we are ready to start coding! +
+INCR global:nextUserId => 1000 +SET uid:1000:username antirez +SET uid:1000:password p1pp0 +We use the global:nextUserId key in order to always get an unique ID for every new user. Then we use this unique ID to populate all the other keys holding our user data. This is a Design Pattern with key-values stores! Keep it in mind. +Besides the fields already defined, we need some more stuff in order to fully define an User. For example sometimes it can be useful to be able to get the user ID from the username, so we set this key too:
+SET username:antirez:uid 1000 +This may appear strange at first, but remember that we are only able to access data by key! It's not possible to tell Redis to return the key that holds a specific value. This is also our strength, this new paradigm is forcing us to organize the data so that everything is accessible by primary key, speaking with relational DBs language. +
+uid:1000:followers => Set of uids of all the followers users +uid:1000:following => Set of uids of all the following users +Another important thing we need is a place were we can add the updates to display in the user home page. We'll need to access this data in chronological order later, from the most recent update to the older ones, so the perfect kind of Value for this work is a List. Basically every new update will be LPUSHed in the user updates key, and thanks to LRANGE we can implement pagination and so on. Note that we use the words updates and posts interchangeably, since updates are actually "little posts" in some way.
+uid:1000:posts => a List of post ids, every new post is LPUSHed here. ++
+SET uid:1000:auth fea5e81ac8ca77622bed1c2132a021f9 +SET auth:fea5e81ac8ca77622bed1c2132a021f9 1000 +In order to authenticate an user we'll do this simple work (login.php): +
<username>
:uid key actually exists+include("retwis.php"); + +# Form sanity checks +if (!gt("username") || !gt("password")) + goback("You need to enter both username and password to login."); + +# The form is ok, check if the username is available +$username = gt("username"); +$password = gt("password"); +$r = redisLink(); +$userid = $r->get("username:$username:id"); +if (!$userid) + goback("Wrong username or password"); +$realpassword = $r->get("uid:$userid:password"); +if ($realpassword != $password) + goback("Wrong useranme or password"); + +# Username / password OK, set the cookie and redirect to index.php +$authsecret = $r->get("uid:$userid:auth"); +setcookie("auth",$authsecret,time()+3600*24*365); +header("Location: index.php"); +This happens every time the users log in, but we also need a function isLoggedIn in order to check if a given user is already authenticated or not. These are the logical steps preformed by the
isLoggedIn
function:
+<authcookie>
<authcookie>
exists, and what the value (the user id) is (1000 in the exmple).+function isLoggedIn() { + global $User, $_COOKIE; + + if (isset($User)) return true; + + if (isset($_COOKIE['auth'])) { + $r = redisLink(); + $authcookie = $_COOKIE['auth']; + if ($userid = $r->get("auth:$authcookie")) { + if ($r->get("uid:$userid:auth") != $authcookie) return false; + loadUserInfo($userid); + return true; + } + } + return false; +} + +function loadUserInfo($userid) { + global $User; + + $r = redisLink(); + $User['id'] = $userid; + $User['username'] = $r->get("uid:$userid:username"); + return true; +} +
loadUserInfo
as separated function is an overkill for our application, but it's a good template for a complex application. The only thing it's missing from all the authentication is the logout. What we do on logout? That's simple, we'll just change the random string in uid:1000:auth, remove the old auth:<oldauthstring>
and add a new auth:<newauthstring>
.<randomstring>
, but double check it against uid:1000:auth. The true authentication string is the latter, the auth:<randomstring>
is just an authentication key that may even be volatile, or if there are bugs in the program or a script gets interrupted we may even end with multiple auth:<something>
keys pointing to the same user id. The logout code is the following (logout.php):+include("retwis.php"); + +if (!isLoggedIn()) { + header("Location: index.php"); + exit; +} + +$r = redisLink(); +$newauthsecret = getrand(); +$userid = $User['id']; +$oldauthsecret = $r->get("uid:$userid:auth"); + +$r->set("uid:$userid:auth",$newauthsecret); +$r->set("auth:$newauthsecret",$userid); +$r->delete("auth:$oldauthsecret"); + +header("Location: index.php"); +That is just what we described and should be simple to undestand. +
+INCR global:nextPostId => 10343 +SET post:10343 "$owner_id|$time|I'm having fun with Retwis" +As you can se the user id and time of the post are stored directly inside the string, we don't need to lookup by time or user id in the example application so it is better to compact everything inside the post string.
+include("retwis.php"); + +if (!isLoggedIn() || !gt("status")) { + header("Location:index.php"); + exit; +} + +$r = redisLink(); +$postid = $r->incr("global:nextPostId"); +$status = str_replace("\n"," ",gt("status")); +$post = $User['id']."|".time()."|".$status; +$r->set("post:$postid",$post); +$followers = $r->smembers("uid:".$User['id'].":followers"); +if ($followers === false) $followers = Array(); +$followers[] = $User['id']; /* Add the post to our own posts too */ + +foreach($followers as $fid) { + $r->push("uid:$fid:posts",$postid,false); +} +# Push the post on the timeline, and trim the timeline to the +# newest 1000 elements. +$r->push("global:timeline",$postid,false); +$r->ltrim("global:timeline",0,1000); + +header("Location: index.php"); +The core of the function is the
foreach
. We get using SMEMBERS all the followers of the current user, then the loop will LPUSH the post against the uid:<userid>
:posts of every follower.+function showPost($id) { + $r = redisLink(); + $postdata = $r->get("post:$id"); + if (!$postdata) return false; + + $aux = explode("|",$postdata); + $id = $aux[0]; + $time = $aux[1]; + $username = $r->get("uid:$id:username"); + $post = join(array_splice($aux,2,count($aux)-2),"|"); + $elapsed = strElapsed($time); + $userlink = "<a class=\"username\" href=\"profile.php?u=".urlencode($username)."\">".utf8entities($username)."</a>"; + + echo('<div class="post">'.$userlink.' '.utf8entities($post)."<br>"); + echo('<i>posted '.$elapsed.' ago via web</i></div>'); + return true; +} + +function showUserPosts($userid,$start,$count) { + $r = redisLink(); + $key = ($userid == -1) ? "global:timeline" : "uid:$userid:posts"; + $posts = $r->lrange($key,$start,$start+$count); + $c = 0; + foreach($posts as $p) { + if (showPost($p)) $c++; + if ($c == $count) break; + } + return count($posts) == $count+1; +} +
showPost
will simply convert and print a Post in HTML while showUserPosts
get range of posts passing them to showPosts
.+SADD uid:1000:following 1001 +SADD uid:1001:followers 1000 +Note the same pattern again and again, in theory with a relational database the list of following and followers is a single table with fields like
following_id
and follower_id
. With queries you can extract the followers or following of every user. With a key-value DB that's a bit different as we need to set both the 1000 is following 1001
and 1001 is followed by 1000
relations. This is the price to pay, but on the other side accessing the data is simpler and ultra-fast. And having this things as separated sets allows us to do interesting stuff, for example using SINTER we can have the intersection of 'following' of two different users, so we may add a feature to our Twitter clone so that it is able to say you at warp speed, when you visit somebody' else profile, "you and foobar have 34 followers in common" and things like that.+server_id = crc32(key) % number_of_servers +This has a lot of problems since if you add one server you need to move too much keys and so on, but this is the general idea even if you use a better hashing scheme like consistent hashing.
global:nextPostId
key. How to fix this problem? A Single server will get a lot if increments. The simplest way to handle this is to have a dedicated server just for increments. This is probably an overkill btw unless you have really a lot of traffic. There is another trick. The ID does not really need to be an incremental number, but just it needs to be unique. So you can get a random string long enough to be unlikely (almost impossible, if it's md5-size) to collide, and you are done. We successfully eliminated our main problem to make it really horizontally scalable!