Make replication faster (biggest gain for small number of slaves)

[redis.git] / doc / FAQ.html
diff --git a/doc/FAQ.html b/doc/FAQ.html

index 7c012b2cfe3edaeb472929d50ee975db6a908086..531fb708e65dcd0109b4f241855fec7f5792c1fb 100644 (file)
--- a/doc/FAQ.html
+++ b/doc/FAQ.html
@@ -58,7 +58,7 @@ Redis for the same objects. This happens because when data is in
  memory is full of pointers, reference counters and other metadata. Add 
  to this malloc fragmentation and need to return word-aligned chunks of 
  memory and you have a clear picture of what happens. So this means to 
-have 10 times the I/O between memory and disk than otherwise needed.<h1><a name="Is there something I can do to lower the Redis memory usage?">Is there something I can do to lower the Redis memory usage?</a></h1>Yes, try to compile it with 32 bit target if you are using a 64 bit box.<br/><br/>If you are using Redis &gt;= 1.3, try using the Hash data type, it can save a lot of memory.<br/><br/>If you are using hashes or any other type with values bigger than 128 bytes try also this to lower the RSS usage (Resident Set Size): EXPORT MMAP_THRESHOLD=4096<h1><a name="I have an empty Redis server but INFO and logs are reporting megabytes of memory in use!">I have an empty Redis server but INFO and logs are reporting megabytes of memory in use!</a></h1>This may happen and it's prefectly ok. Redis objects are small C structures allocated and freed a lot of times. This costs a lot of CPU so instead of being freed, released objects are taken into a free list and reused when needed. This memory is taken exactly by this free objects ready to be reused.<h1><a name="What happens if Redis runs out of memory?">What happens if Redis runs out of memory?</a></h1>With modern operating systems malloc() returning NULL is not common, usually the server will start swapping and Redis performances will be disastrous so you'll know it's time to use more Redis servers or get more RAM.<br/><br/>The INFO command (work in progress in this days) will report the amount of memory Redis is using so you can write scripts that monitor your Redis servers checking for critical conditions.<br/><br/>You can also use the &quot;maxmemory&quot; option in the config file to put a limit to the memory Redis can use. If this limit is reached Redis will start to reply with an error to write commands (but will continue to accept read-only commands).<h1><a name="Does Redis use more memory running in 64 bit boxes? Can I use 32 bit Redis in 64 bit systems?">Does Redis use more memory running in 64 bit boxes? Can I use 32 bit Redis in 64 bit systems?</a></h1>Redis uses a lot more memory when compiled for 64 bit target, especially if the dataset is composed of many small keys and values. Such a database will, for instance, consume 50 MB of RAM when compiled for the 32 bit target, and 80 MB for 64 bit! That's a big difference.<br/><br/>You can run 32 bit Redis binaries in a 64 bit Linux and Mac OS X system without problems. For OS X just use <b>make 32bit</b>. For Linux instead, make sure you have <b>libc6-dev-i386</b> installed, then use <b>make 32bit</b> if you are using the latest Git version. Instead for Redis &lt;= 1.2.2 you have to edit the Makefile and replace &quot;-arch i386&quot; with &quot;-m32&quot;.<br/><br/>If your application is already able to perform application-level sharding, it is very advisable to run N instances of Redis 32bit against a big 64 bit Redis box (with more than 4GB of RAM) instead than a single 64 bit instance, as this is much more memory efficient. <h1><a name="How much time it takes to load a big database at server startup?">How much time it takes to load a big database at server startup?</a></h1>Just an example on normal hardware: It takes about 45 seconds to restore a 2 GB database on a fairly standard system, no RAID. This can give you some kind of feeling about the order of magnitude of the time needed to load data when you restart the server.<h1><a name="Background saving is failing with a fork() error under Linux even if I've a lot of free RAM!">Background saving is failing with a fork() error under Linux even if I've a lot of free RAM!</a></h1>Short answer: <code name="code" class="python">echo 1 &gt; /proc/sys/vm/overcommit_memory</code> :)<br/><br/>And now the long one:<br/><br/>Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will <i>share</i> the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the <code name="code" class="python">overcommit_memory</code> setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.<br/><br/>Setting <code name="code" class="python">overcommit_memory</code> to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.<h1><a name="Are Redis on disk snapshots atomic?">Are Redis on disk snapshots atomic?</a></h1>Yes, redis background saving process is always fork(2)ed when the server is outside of the execution of a command, so every command reported to be atomic in RAM is also atomic from the point of view of the disk snapshot.<h1><a name="Redis is single threaded, how can I exploit multiple CPU / cores?">Redis is single threaded, how can I exploit multiple CPU / cores?</a></h1>Simply start multiple instances of Redis in different ports in the same box and threat them as different servers! Given that Redis is a distributed database anyway in order to scale you need to think in terms of multiple computational units. At some point a single box may not be enough anyway.<br/><br/>In general key-value databases are very scalable because of the property that different keys can stay on different servers independently.<br/><br/>In Redis there are client libraries such Redis-rb (the Ruby client) that are able to handle multiple servers automatically using <i>consistent hashing</i>. We are going to implement consistent hashing in all the other major client libraries. If you use a different language you can implement it yourself otherwise just hash the key before to SET / GET it from a given server. For example imagine to have N Redis servers, server-0, server-1, ..., server-N. You want to store the key &quot;foo&quot;, what's the right server where to put &quot;foo&quot; in order to distribute keys evenly among different servers? Just perform the <i>crc</i> = CRC32(&quot;foo&quot;), then <i>servernum</i> = <i>crc</i> % N (the rest of the division for N). This will give a number between 0 and N-1 for every key. Connect to this server and store the key. The same for gets.<br/><br/>This is a basic way of performing key partitioning, consistent hashing is much better and this is why after Redis 1.0 will be released we'll try to implement this in every widely used client library starting from Python and PHP (Ruby already implements this support).<h1><a name="I'm using some form of key hashing for partitioning, but what about SORT BY?">I'm using some form of key hashing for partitioning, but what about SORT BY?</a></h1>With <a href="SortCommand.html">SORT</a> BY you need that all the <i>weight keys</i> are in the same Redis instance of the list/set you are trying to sort. In order to make this possible we developed a concept called <i>key tags</i>. A key tag is a special pattern inside a key that, if preset, is the only part of the key hashed in order to select the server for this key. For example in order to hash the key &quot;foo&quot; I simply perform the CRC32 checksum of the whole string, but if this key has a pattern in the form of the characters {...} I only hash this substring. So for example for the key &quot;foo{bared}&quot; the key hashing code will simply perform the CRC32 of &quot;bared&quot;. This way using key tags you can ensure that related keys will be stored on the same Redis instance just using the same key tag for all this keys. Redis-rb already implements key tags.<h1><a name="What is the maximum number of keys a single Redis instance can hold? and what the max number of elements in a List, Set, Ordered Set?">What is the maximum number of keys a single Redis instance can hold? and what the max number of elements in a List, Set, Ordered Set?</a></h1>In theory Redis can handle up to 2<sup>32 keys, and was tested in practice to handle at least 150 million of keys per instance. We are working in order to experiment with larger values.<br/><br/>Every list, set, and ordered set, can hold 2</sup>32 elements.<br/><br/>Actually Redis internals are ready to allow up to 2<sup>64 elements but the current disk dump format don't support this, and there is a lot time to fix this issues in the future as currently even with 128 GB of RAM it's impossible to reach 2</sup>32 elements.<h1><a name="What Redis means actually?">What Redis means actually?</a></h1>Redis means two things:
+have 10 times the I/O between memory and disk than otherwise needed.<h1><a name="Is there something I can do to lower the Redis memory usage?">Is there something I can do to lower the Redis memory usage?</a></h1>Yes, try to compile it with 32 bit target if you are using a 64 bit box.<br/><br/>If you are using Redis &gt;= 1.3, try using the Hash data type, it can save a lot of memory.<br/><br/>If you are using hashes or any other type with values bigger than 128 bytes try also this to lower the RSS usage (Resident Set Size): EXPORT MMAP_THRESHOLD=4096<h1><a name="I have an empty Redis server but INFO and logs are reporting megabytes of memory in use!">I have an empty Redis server but INFO and logs are reporting megabytes of memory in use!</a></h1>This may happen and it's prefectly ok. Redis objects are small C structures allocated and freed a lot of times. This costs a lot of CPU so instead of being freed, released objects are taken into a free list and reused when needed. This memory is taken exactly by this free objects ready to be reused.<h1><a name="What happens if Redis runs out of memory?">What happens if Redis runs out of memory?</a></h1>With modern operating systems malloc() returning NULL is not common, usually the server will start swapping and Redis performances will be disastrous so you'll know it's time to use more Redis servers or get more RAM.<br/><br/>The INFO command (work in progress in this days) will report the amount of memory Redis is using so you can write scripts that monitor your Redis servers checking for critical conditions.<br/><br/>You can also use the &quot;maxmemory&quot; option in the config file to put a limit to the memory Redis can use. If this limit is reached Redis will start to reply with an error to write commands (but will continue to accept read-only commands).<h1><a name="Does Redis use more memory running in 64 bit boxes? Can I use 32 bit Redis in 64 bit systems?">Does Redis use more memory running in 64 bit boxes? Can I use 32 bit Redis in 64 bit systems?</a></h1>Redis uses a lot more memory when compiled for 64 bit target, especially if the dataset is composed of many small keys and values. Such a database will, for instance, consume 50 MB of RAM when compiled for the 32 bit target, and 80 MB for 64 bit! That's a big difference.<br/><br/>You can run 32 bit Redis binaries in a 64 bit Linux and Mac OS X system without problems. For OS X just use <b>make 32bit</b>. For Linux instead, make sure you have <b>libc6-dev-i386</b> installed, then use <b>make 32bit</b> if you are using the latest Git version. Instead for Redis &lt;= 1.2.2 you have to edit the Makefile and replace &quot;-arch i386&quot; with &quot;-m32&quot;.<br/><br/>If your application is already able to perform application-level sharding, it is very advisable to run N instances of Redis 32bit against a big 64 bit Redis box (with more than 4GB of RAM) instead than a single 64 bit instance, as this is much more memory efficient. <h1><a name="How much time it takes to load a big database at server startup?">How much time it takes to load a big database at server startup?</a></h1>Just an example on normal hardware: It takes about 45 seconds to restore a 2 GB database on a fairly standard system, no RAID. This can give you some kind of feeling about the order of magnitude of the time needed to load data when you restart the server.<h1><a name="Background saving is failing with a fork() error under Linux even if I've a lot of free RAM!">Background saving is failing with a fork() error under Linux even if I've a lot of free RAM!</a></h1>Short answer: <code name="code" class="python">echo 1 &gt; /proc/sys/vm/overcommit_memory</code> :)<br/><br/>And now the long one:<br/><br/>Redis background saving schema relies on the copy-on-write semantic of fork in modern operating systems: Redis forks (creates a child process) that is an exact copy of the parent. The child process dumps the DB on disk and finally exits. In theory the child should use as much memory as the parent being a copy, but actually thanks to the copy-on-write semantic implemented by most modern operating systems the parent and child process will <i>share</i> the common memory pages. A page will be duplicated only when it changes in the child or in the parent. Since in theory all the pages may change while the child process is saving, Linux can't tell in advance how much memory the child will take, so if the <code name="code" class="python">overcommit_memory</code> setting is set to zero fork will fail unless there is as much free RAM as required to really duplicate all the parent memory pages, with the result that if you have a Redis dataset of 3 GB and just 2 GB of free memory it will fail.<br/><br/>Setting <code name="code" class="python">overcommit_memory</code> to 1 says Linux to relax and perform the fork in a more optimistic allocation fashion, and this is indeed what you want for Redis.<br/><br/>A good source to understand how Linux Virtual Memory work and other alternatives for <code name="code" class="python">overcommit_memory</code> and <code name="code" class="python">overcommit_ratio</code> is this classic from Red Hat Magaize, &quot;Understanding Virtual Memory&quot;: <a href="http://www.redhat.com/magazine/001nov04/features/vm/" target="_blank">http://www.redhat.com/magazine/001nov04/features/vm/</a> <h1><a name="Are Redis on disk snapshots atomic?">Are Redis on disk snapshots atomic?</a></h1>Yes, redis background saving process is always fork(2)ed when the server is outside of the execution of a command, so every command reported to be atomic in RAM is also atomic from the point of view of the disk snapshot.<h1><a name="Redis is single threaded, how can I exploit multiple CPU / cores?">Redis is single threaded, how can I exploit multiple CPU / cores?</a></h1>Simply start multiple instances of Redis in different ports in the same box and threat them as different servers! Given that Redis is a distributed database anyway in order to scale you need to think in terms of multiple computational units. At some point a single box may not be enough anyway.<br/><br/>In general key-value databases are very scalable because of the property that different keys can stay on different servers independently.<br/><br/>In Redis there are client libraries such Redis-rb (the Ruby client) that are able to handle multiple servers automatically using <i>consistent hashing</i>. We are going to implement consistent hashing in all the other major client libraries. If you use a different language you can implement it yourself otherwise just hash the key before to SET / GET it from a given server. For example imagine to have N Redis servers, server-0, server-1, ..., server-N. You want to store the key &quot;foo&quot;, what's the right server where to put &quot;foo&quot; in order to distribute keys evenly among different servers? Just perform the <i>crc</i> = CRC32(&quot;foo&quot;), then <i>servernum</i> = <i>crc</i> % N (the rest of the division for N). This will give a number between 0 and N-1 for every key. Connect to this server and store the key. The same for gets.<br/><br/>This is a basic way of performing key partitioning, consistent hashing is much better and this is why after Redis 1.0 will be released we'll try to implement this in every widely used client library starting from Python and PHP (Ruby already implements this support).<h1><a name="I'm using some form of key hashing for partitioning, but what about SORT BY?">I'm using some form of key hashing for partitioning, but what about SORT BY?</a></h1>With <a href="SortCommand.html">SORT</a> BY you need that all the <i>weight keys</i> are in the same Redis instance of the list/set you are trying to sort. In order to make this possible we developed a concept called <i>key tags</i>. A key tag is a special pattern inside a key that, if preset, is the only part of the key hashed in order to select the server for this key. For example in order to hash the key &quot;foo&quot; I simply perform the CRC32 checksum of the whole string, but if this key has a pattern in the form of the characters {...} I only hash this substring. So for example for the key &quot;foo{bared}&quot; the key hashing code will simply perform the CRC32 of &quot;bared&quot;. This way using key tags you can ensure that related keys will be stored on the same Redis instance just using the same key tag for all this keys. Redis-rb already implements key tags.<h1><a name="What is the maximum number of keys a single Redis instance can hold? and what the max number of elements in a List, Set, Ordered Set?">What is the maximum number of keys a single Redis instance can hold? and what the max number of elements in a List, Set, Ordered Set?</a></h1>In theory Redis can handle up to 2<sup>32 keys, and was tested in practice to handle at least 150 million of keys per instance. We are working in order to experiment with larger values.<br/><br/>Every list, set, and ordered set, can hold 2</sup>32 elements.<br/><br/>Actually Redis internals are ready to allow up to 2<sup>64 elements but the current disk dump format don't support this, and there is a lot time to fix this issues in the future as currently even with 128 GB of RAM it's impossible to reach 2</sup>32 elements.<h1><a name="What Redis means actually?">What Redis means actually?</a></h1>Redis means two things:
  <ul><li> it's a joke on the word Redistribute (instead to use just a Relational DB redistribute your workload among Redis servers)</li><li> it means REmote DIctionary Server</li></ul>
  <h1><a name="Why did you started the Redis project?">Why did you started the Redis project?</a></h1>In order to scale <a href="http://lloogg.com" target="_blank">LLOOGG</a>. But after I got the basic server working I liked the idea to share the work with other guys, and Redis was turned into an open source project.
                  </div>