git.saurik.com Git - redis.git/log

]> git.saurik.com Git - redis.git/log

antirez [Thu, 22 Nov 2012 14:50:00 +0000 (15:50 +0100)]

EVALSHA is now case insensitive.

EVALSHA used to crash if the SHA1 was not lowercase (Issue #783).
Fixed using a case insensitive dictionary type for the sha -> script
map used for replication of scripts.

commit | commitdiff | tree

antirez [Thu, 22 Nov 2012 14:28:28 +0000 (15:28 +0100)]

Fix integer overflow in zunionInterGenericCommand().

This fixes issue #761.

commit | commitdiff | tree

antirez [Sat, 17 Nov 2012 11:11:13 +0000 (12:11 +0100)]

Test: MULTI state is cleared after EXECABORT error.

commit | commitdiff | tree

antirez [Sat, 17 Nov 2012 11:09:17 +0000 (12:09 +0100)]

Test: make sure EXEC fails after previous transaction errors.

commit | commitdiff | tree

antirez [Sat, 17 Nov 2012 10:17:54 +0000 (11:17 +0100)]

Test: MULTI/EXEC tests moved into multi.tcl.

commit | commitdiff | tree

antirez [Thu, 15 Nov 2012 19:11:05 +0000 (20:11 +0100)]

Safer handling of MULTI/EXEC on errors.

After the transcation starts with a MULIT, the previous behavior was to
return an error on problems such as maxmemory limit reached. But still
to execute the transaction with the subset of queued commands on EXEC.

While it is true that the client was able to check for errors
distinguish QUEUED by an error reply, MULTI/EXEC in most client
implementations uses pipelining for speed, so all the commands and EXEC
are sent without caring about replies.

With this change:

1) EXEC fails if at least one command was not queued because of an
error. The EXECABORT error is used.
2) A generic error is always reported on EXEC.
3) The client DISCARDs the MULTI state after a failed EXEC, otherwise
pipelining multiple transactions would be basically impossible:
After a failed EXEC the next transaction would be simply queued as
the tail of the previous transaction.

commit | commitdiff | tree

antirez [Thu, 22 Nov 2012 09:08:44 +0000 (10:08 +0100)]

Make bio.c threads killable ASAP if needed.

We use this new bio.c feature in order to stop our I/O threads if there
is a memory test to do on crash. In this case we don't want anything
else than the main thread to run, otherwise the other threads may mess
with the heap and the memory test will report a false positive.

commit | commitdiff | tree

antirez [Wed, 21 Nov 2012 12:19:38 +0000 (13:19 +0100)]

Fast memory test on Redis crash.

commit | commitdiff | tree

antirez [Wed, 21 Nov 2012 12:17:38 +0000 (13:17 +0100)]

Use more fine grained HAVE macros instead of HAVE_PROCFS.

commit | commitdiff | tree

antirez [Mon, 19 Nov 2012 11:02:08 +0000 (12:02 +0100)]

Children creating AOF or RDB files now report memory used by COW.

Finally Redis is able to report the amount of memory used by
copy-on-write while saving an RDB or writing an AOF file in background.

Note that this information is currently only logged (at NOTICE level)
and not shown in INFO because this is less trivial (but surely doable
with some minor form of interprocess communication).

The reason we can't capture this information on the parent before we
call wait3() is that the Linux kernel will release the child memory
ASAP, and only retain the minimal state for the process that is useful
to report the child termination to the parent.

The COW size is obtained by summing all the Private_Dirty fields found
in the "smap" file inside the proc filesystem for the process.

All this is Linux specific and is not available on other systems.

commit | commitdiff | tree

antirez [Mon, 19 Nov 2012 10:24:56 +0000 (11:24 +0100)]

zmalloc_get_private_dirty() function added (Linux only).

For non Linux systmes it just returns 0.

This function is useful to estimate copy-on-write because of childs
saving stuff on disk.

commit | commitdiff | tree

antirez [Wed, 14 Nov 2012 11:52:38 +0000 (12:52 +0100)]

zmalloc: kill unused __size parameter in update_zmalloc_stat_alloc() macro.

commit | commitdiff | tree

antirez [Wed, 14 Nov 2012 11:21:23 +0000 (12:21 +0100)]

Merge branch 'migrate-cache' into unstable

commit | commitdiff | tree

antirez [Wed, 14 Nov 2012 11:12:52 +0000 (12:12 +0100)]

Test: more MIGRATE tests.

commit | commitdiff | tree

antirez [Wed, 14 Nov 2012 10:30:24 +0000 (11:30 +0100)]

MIGRATE: retry one time on I/O error.

Now that we cache connections, a retry attempt makes sure that the
operation don't fail just because there is an existing connection error
on the socket, like the other end closing the connection.

Unfortunately this condition is not detectable using
getsockopt(SO_ERROR), so the only option left is to retry.

We don't retry on timeouts.

commit | commitdiff | tree

antirez [Tue, 13 Nov 2012 17:11:48 +0000 (18:11 +0100)]

Test: check if MIGRATE is caching connections.

commit | commitdiff | tree

antirez [Mon, 12 Nov 2012 22:04:36 +0000 (23:04 +0100)]

TTL API change: TTL returns -2 for non existing keys.

The previous behavior was to return -1 if:

1) Existing key but without an expire set.
2) Non existing key.

Now the second case is handled in a different, and TTL will return -2
if the key does not exist at all.

PTTL follows the same behavior as well.

commit | commitdiff | tree

antirez [Mon, 12 Nov 2012 14:04:54 +0000 (15:04 +0100)]

MIGRATE: fix default timeout to 1000 milliseconds.

When a timeout <= 0 is provided we set a default timeout of 1 second.
It was set to 1 millisecond for an error resulting from a recent change.

commit | commitdiff | tree

antirez [Mon, 12 Nov 2012 13:01:56 +0000 (14:01 +0100)]

MIGRATE count of cached sockets in INFO output.

commit | commitdiff | tree

antirez [Mon, 12 Nov 2012 13:00:59 +0000 (14:00 +0100)]

MIGRATE timeout should be in milliseconds.

While it is documented that the MIGRATE timeout is in milliseconds, it
was in seconds instead. This commit fixes the problem.

commit | commitdiff | tree

antirez [Sun, 11 Nov 2012 23:45:10 +0000 (00:45 +0100)]

MIGRATE TCP connections caching.

By caching TCP connections used by MIGRATE to chat with other Redis
instances a 5x performance improvement was measured with
redis-benchmark against small keys.

This can dramatically speedup cluster resharding and other processes
where an high load of MIGRATE commands are used.

commit | commitdiff | tree

antirez [Thu, 8 Nov 2012 18:14:29 +0000 (19:14 +0100)]

commit | commitdiff | tree

antirez [Thu, 8 Nov 2012 17:43:20 +0000 (18:43 +0100)]

Make clear that contributing code to the Redis project means to release it under the terms of the BSD license.

commit | commitdiff | tree

antirez [Thu, 8 Nov 2012 17:25:23 +0000 (18:25 +0100)]

BSD license added to every C source and header file.

commit | commitdiff | tree

antirez [Wed, 7 Nov 2012 14:32:27 +0000 (15:32 +0100)]

COPY and REPLACE options for MIGRATE.

With COPY now MIGRATE does not remove the key from the source instance.
With REPLACE it uses RESTORE REPLACE on the target host so that even if
the key already eixsts in the target instance it will be overwritten.

The options can be used together.

commit | commitdiff | tree

antirez [Wed, 7 Nov 2012 09:57:23 +0000 (10:57 +0100)]

REPLACE option for RESTORE.

The REPLACE option deletes an existing key with the same name (if any)
and materializes the new one. The default behavior without RESTORE is to
return an error if a key already exists.

commit | commitdiff | tree

antirez [Tue, 6 Nov 2012 19:25:34 +0000 (20:25 +0100)]

Type mismatch errors are now prefixed with WRONGTYPE.

So instead to reply with a generic error like:

-ERR ... wrong kind of value ...

now it replies with:

-WRONGTYPE ... wrong kind of value ...

This makes this particular error easy to check without resorting to
(fragile) pattern matching of the error string (however the error string
used to be consistent already).

Client libraries should return a specific exeption type for this error.

Most of the commit is about fixing unit tests.

commit | commitdiff | tree

Salvatore Sanfilippo [Fri, 2 Nov 2012 11:10:47 +0000 (04:10 -0700)]

Merge pull request #741 from Run/typo

fix a typo in redis.h line 595 comment

commit | commitdiff | tree

antirez [Thu, 1 Nov 2012 21:39:39 +0000 (22:39 +0100)]

More robust handling of AOF rewrite child.

After the wait3() syscall we used to do something like that:

    if (pid == server.rdb_child_pid) {
        backgroundSaveDoneHandler(exitcode,bysignal);
    } else {
        ....
    }

So the AOF rewrite was handled in the else branch without actually
checking if the pid really matches. This commit makes the check explicit
and logs at WARNING level if the pid returned by wait3() does not match
neither the RDB or AOF rewrite child.

commit | commitdiff | tree

Yecheng Fu [Thu, 1 Nov 2012 10:14:55 +0000 (18:14 +0800)]

fix typo in comments (redis.c, networking.c)

commit | commitdiff | tree

antirez [Thu, 1 Nov 2012 21:10:45 +0000 (22:10 +0100)]

Unix socket clients properly displayed in MONITOR and CLIENT LIST.

This also fixes issue #745.

commit | commitdiff | tree

antirez [Thu, 1 Nov 2012 14:36:37 +0000 (15:36 +0100)]

32 bit build fixed on Linux.

It failed because of the way jemalloc was compiled (without passing the
right flags to make, but just to configure). Now the same set of flags
are also passed to the make command, fixing the issue.

This fixes issue #744

commit | commitdiff | tree

Runzhen Wang [Wed, 31 Oct 2012 18:14:22 +0000 (02:14 +0800)]

fix a typo in redis.h line 595 comment

commit | commitdiff | tree

Salvatore Sanfilippo [Wed, 31 Oct 2012 08:29:04 +0000 (01:29 -0700)]

Merge pull request #726 from yamt/typo

fix a typo in a comment

commit | commitdiff | tree

antirez [Wed, 31 Oct 2012 08:23:05 +0000 (09:23 +0100)]

Invert two sides of if expression in SET to avoid a lookup.

Because of the short circuit behavior of && inverting the two sides of
the if expression avoids an hash table lookup if the non-EX variant of
SET is called.

Thanks to Weibin Yao (@yaoweibin on github) for spotting this.

commit | commitdiff | tree

antirez [Tue, 30 Oct 2012 18:10:46 +0000 (19:10 +0100)]

No longer used macro rdbIsOpcode() removed.

commit | commitdiff | tree

antirez [Tue, 30 Oct 2012 17:57:20 +0000 (18:57 +0100)]

help.h update (adds bitop, bitcount, evalsha...)

commit | commitdiff | tree

antirez [Fri, 26 Oct 2012 14:06:25 +0000 (16:06 +0200)]

Ctrl+w support in linenoise.

commit | commitdiff | tree

antirez [Fri, 26 Oct 2012 13:38:21 +0000 (15:38 +0200)]

Marginally more robust glibc version test for sync_file_range detection.

commit | commitdiff | tree

charsyam [Thu, 25 Oct 2012 20:27:58 +0000 (04:27 +0800)]

patch config.h for sync_file_range

commit | commitdiff | tree

antirez [Thu, 25 Oct 2012 14:15:55 +0000 (16:15 +0200)]

Fix compilation on Linux kernels or glibc versions lacking sync_file_range().

This fixes issue #667.

Many thanks to Didier Spezia for the fix.

commit | commitdiff | tree

antirez [Wed, 24 Oct 2012 10:21:34 +0000 (12:21 +0200)]

Update memory peak stats while loading RDB / AOF.

commit | commitdiff | tree

YAMAMOTO Takashi [Wed, 24 Oct 2012 08:47:56 +0000 (17:47 +0900)]

fix a typo in a comment

commit | commitdiff | tree

antirez [Mon, 22 Oct 2012 17:21:47 +0000 (19:21 +0200)]

A filed called slave_read_only added in INFO output.

This was an important information missing from the INFO output in the
replication section.

It obviously reflects if the slave is read only or not.

commit | commitdiff | tree

Salvatore Sanfilippo [Mon, 22 Oct 2012 09:55:23 +0000 (02:55 -0700)]

Merge pull request #693 from ghurrell/dict-h-typos

Fix (cosmetic) typos in dict.h

commit | commitdiff | tree

Schuster [Mon, 22 Oct 2012 09:44:20 +0000 (11:44 +0200)]

redis-check-dump now understands dumps produced by Redis 2.6

(Commit message from @antirez as it was missign in the original commits,
also the patch was modified a bit to still work with 2.4 dumps and to
avoid if expressions that are always true due to checked types range)

This commit changes redis-check-dump to account for new encodings and
for the new MSTIME expire format. It also refactors the test for valid
type into a function.

The code is still compatible with Redis 2.4 generated dumps.

This fixes issue #709.

commit | commitdiff | tree

antirez [Mon, 22 Oct 2012 08:43:39 +0000 (10:43 +0200)]

Default memory limit for 32bit instanced moved from 3.5 GB to 3 GB.

In some system, notably osx, the 3.5 GB limit was too far and not able
to prevent a crash for out of memory. The 3 GB limit works better and it
is still a lot of memory within a 4 GB theorical limit so it's not going
to bore anyone :-)

This fixes issue #711

commit | commitdiff | tree

antirez [Mon, 22 Oct 2012 08:28:54 +0000 (10:28 +0200)]

Differentiate SCRIPT KILL error replies.

When calling SCRIPT KILL currently you can get two errors:

* No script in timeout (busy) state.
* The script already performed a write.

It is useful to be able to distinguish the two errors, but right now both
start with "ERR" prefix, so string matching (that is fragile) must be used.

This commit introduces two different prefixes.

-NOTBUSY and -UNKILLABLE respectively to reply with an error when no
script is busy at the moment, and when the script already executed a
write operation and can not be killed.

commit | commitdiff | tree

antirez [Tue, 16 Oct 2012 15:35:50 +0000 (17:35 +0200)]

Fix MULTI / EXEC rendering in MONITOR output.

Before of this commit it used to be like this:

MULTI
EXEC
... actual commands of the transaction ...

Because after all that is the natural order of things. Transaction
commands are queued and executed *only after* EXEC is called.

However this makes debugging with MONITOR a mess, so the code was
modified to provide a coherent output.

What happens is that MULTI is rendered in the MONITOR output as far as
possible, instead EXEC is propagated only after the transaction is
executed, or even in the case it fails because of WATCH, so in this case
you'll simply see:

MULTI
EXEC

An empty transaction.

commit | commitdiff | tree

antirez [Thu, 11 Oct 2012 16:34:05 +0000 (18:34 +0200)]

Allow AUTH when Redis is busy because of timedout Lua script.

If the server is password protected we need to accept AUTH when there is
a server busy (-BUSY) condition, otherwise it will be impossible to send
SHUTDOWN NOSAVE or SCRIPT KILL.

This fixes issue #708.

commit | commitdiff | tree

Salvatore Sanfilippo [Wed, 10 Oct 2012 09:18:14 +0000 (02:18 -0700)]

Merge pull request #707 from NanXiao/patch-1

Update src/redis-benchmark.c

commit | commitdiff | tree

NanXiao [Wed, 10 Oct 2012 09:08:43 +0000 (17:08 +0800)]

Update src/redis-benchmark.c

The code of current implementation:

if (c->pending == 0) clientDone(c);
In clientDone function, the c's memory has been freed, then the loop will continue: while(c->pending). The memory of c has been freed now, so c->pending is invalid (c is an invalid pointer now), and this will cause memory dump in some platforams(eg: Solaris).

So I think the code should be modified as:
if (c->pending == 0)
{
clientDone(c);
break;
}
and this will not lead to while(c->pending).

commit | commitdiff | tree

antirez [Sat, 6 Oct 2012 10:04:27 +0000 (12:04 +0200)]

CONTRIBUTING file updated.

commit | commitdiff | tree

dvir volk [Fri, 8 Jun 2012 13:03:18 +0000 (16:03 +0300)]

fixed server install script to rewrite the default configuration file and not a template, and removed the old config template
Conflicts:

utils/redis.conf.tpl

commit | commitdiff | tree

antirez [Wed, 3 Oct 2012 17:14:46 +0000 (19:14 +0200)]

Hash function switched to murmurhash2.

The previously used hash function, djbhash, is not secure against
collision attacks even when the seed is randomized as there are simple
ways to find seed-independent collisions.

The new hash function appears to be safe (or much harder to exploit at
least) in this case, and has better distribution.

Better distribution does not always means that's better. For instance in
a fast benchmark with "DEBUG POPULATE 1000000" I obtained the following
results:

1.6 seconds with djbhash
2.0 seconds with murmurhash2

This is due to the fact that djbhash will hash objects that follow the
pattern `prefix:<id>` and where the id is numerically near, to near
buckets. This improves the locality.

However in other access patterns with keys that have no relation
murmurhash2 has some (apparently minimal) speed advantage.

On the other hand a better distribution should significantly
improve the quality of the distribution of elements returned with
dictGetRandomKey() that is used in SPOP, SRANDMEMBER, RANDOMKEY, and
other commands.

Everything considered, and under the suspect that this commit fixes a
security issue in Redis, we are switching to the new hash function.
If some serious speed regression will be found in the future we'll be able
to step back easiliy.

This commit fixes issue #663.

commit | commitdiff | tree

antirez [Fri, 5 Oct 2012 08:48:49 +0000 (10:48 +0200)]

Warn when configured maxmemory value seems odd.

This commit warns the user with a log at "warning" level if:

1) After the server startup the maxmemory limit was found to be < 1MB.
2) After a CONFIG SET command modifying the maxmemory setting the limit
is set to a value that is smaller than the currently used memory.

The behaviour of the Redis server is unmodified, and this wil not make
the CONFIG SET command or a wrong configuration in redis.conf less
likely to create problems, but at least this will make aware most users
about a possbile error they committed without resorting to external
help.

However no warning is issued if, as a result of loading the AOF or RDB
file, we are very near the maxmemory setting, or key eviction will be
needed in order to go under the specified maxmemory setting. The reason
is that in servers configured as a cache with an aggressive
maxmemory-policy most of the times restarting the server will cause this
condition to happen if persistence is not switched off.

This fixes issue #429.

commit | commitdiff | tree

antirez [Fri, 5 Oct 2012 08:10:34 +0000 (10:10 +0200)]

Include time.h in ae.c as we now use time().

commit | commitdiff | tree

Jokea [Thu, 30 Aug 2012 07:08:19 +0000 (15:08 +0800)]

Force expire all timer events when system clock skew is detected.

When system time changes back, the timer will not worker properly
hence some core functionality of redis will stop working(e.g. replication,
bgsave, etc). See issue #633 for details.

The patch saves the previous time and when a system clock skew is detected,
it will force expire all timers.

Modiifed by @antirez: the previous time was moved into the eventLoop
structure to make sure the library is still thread safe as long as you
use different event loops into different threads (otherwise you need
some synchronization). More comments added about the reasoning at the
base of the patch, that's worth reporting here:

/* If the system clock is moved to the future, and then set back to the
* right value, time events may be delayed in a random way. Often this
* means that scheduled operations will not be performed soon enough.
*
* Here we try to detect system clock skews, and force all the time
* events to be processed ASAP when this happens: the idea is that
* processing events earlier is less dangerous than delaying them
* indefinitely, and practice suggests it is. */

commit | commitdiff | tree

antirez [Thu, 4 Oct 2012 09:49:17 +0000 (11:49 +0200)]

"Timeout receiving bulk data" error message modified.

The new message now contains an hint about modifying the repl-timeout
configuration directive if the problem persists.

This should normally not be needed, because while the master generates
the RDB file it makes sure to send newlines to the replication channel
to prevent timeouts. However there are times when masters running on
very slow systems can completely stop for seconds during the RDB saving
process. In such a case enlarging the timeout value can fix the problem.

See issue #695 for an example of this problem in an EC2 deployment.

commit | commitdiff | tree

antirez [Wed, 3 Oct 2012 09:41:08 +0000 (11:41 +0200)]

"SORT by nosort" (skip sorting) respect sorted set ordering.

When SORT is called with the option BY set to a string constant not
inclduing the wildcard character "*", there is no way to sort the output
so any ordering is valid. This allows the SORT internals to optimize its
work and don't really sort the output at all.

However it was odd that this option was not able to retain the natural
order of a sorted set. This feature was requested by users multiple
times as sometimes to call SORT with GET against sorted sets as a way to
mass-fetch objects can be handy.

This commit introduces two things:

1) The ability of SORT to return sorted sets elements in their natural
ordering when `BY nosort` is specified, accordingly to `DESC / ASC` options.
2) The ability of SORT to optimize this case further if LIMIT is passed
as well, avoiding to really fetch the whole sorted set, but directly
obtaining the specified range.

Because in this case the sorting is always deterministic, no
post-sorting activity is performed when SORT is called from a Lua
script.

This commit fixes issue #98.

commit | commitdiff | tree

Greg Hurrell [Wed, 3 Oct 2012 04:58:36 +0000 (21:58 -0700)]

Fix (cosmetic) typos in dict.h

commit | commitdiff | tree

antirez [Mon, 1 Oct 2012 08:10:03 +0000 (10:10 +0200)]

Revert "Scripting: redis.NIL to return nil bulk replies."

This reverts commit e061d797d739f2beeb22b9e8ac519d1df070e3a8.

Conflicts:

src/scripting.c

commit | commitdiff | tree

antirez [Fri, 28 Sep 2012 14:54:57 +0000 (16:54 +0200)]

Scripting: add helper functions redis.error_reply() and redis.status_reply().

A previous commit introduced Redis.NIL. This commit adds similar helper
functions to return tables with a single field set to the specified
string so that instead of using 'return {err="My Error"}' it is possible
to use a more idiomatic form:

return redis.error_reply("My Error")
return redis.status_reply("OK")

commit | commitdiff | tree

antirez [Fri, 28 Sep 2012 12:19:15 +0000 (14:19 +0200)]

Scripting: redis.NIL to return nil bulk replies.

Lua arrays can't contain nil elements (see
http://www.lua.org/pil/19.1.html for more information), so Lua scripts
were not able to return a multi-bulk reply containing nil bulk
elements inside.

This commit introduces a special conversion: a table with just
a "nilbulk" field set to a boolean value is converted by Redis as a nil
bulk reply, but at the same time for Lua this type is not a "nil" so can
be used inside Lua arrays.

This type is also assigned to redis.NIL, so the following two forms
are equivalent and will be able to return a nil bulk reply as second
element of a three elements array:

    EVAL "return {1,redis.NIL,3}" 0
    EVAL "return {1,{nilbulk=true},3}" 0

The result in redis-cli will be:

    1) (integer) 1
    2) (nil)
    3) (integer) 3

commit | commitdiff | tree

antirez [Wed, 26 Sep 2012 16:59:54 +0000 (18:59 +0200)]

Sentinel: Support for AUTH.

commit | commitdiff | tree

antirez [Fri, 21 Sep 2012 09:33:06 +0000 (11:33 +0200)]

Test for SRANDMEMBER with <count>.

commit | commitdiff | tree

antirez [Thu, 20 Sep 2012 14:33:36 +0000 (16:33 +0200)]

SRANDMEMBER <count> leak fixed.

For "CASE 4" (see code) we need to free the element if it's already in
the result dictionary and adding it failed.

commit | commitdiff | tree

antirez [Wed, 19 Sep 2012 19:29:40 +0000 (21:29 +0200)]

Added the SRANDMEMBER key <count> variant.

SRANDMEMBER called with just the key argument can just return a single
random element from a Redis Set. However many users need to return
multiple unique elements from a Set, this is not a trivial problem to
handle in the client side, and for truly good performance a C
implementation was required.

After many requests for this feature it was finally implemented.

The problem implementing this command is the strategy to follow when
the number of elements the user asks for is near to the number of
elements that are already inside the set. In this case asking random
elements to the dictionary API, and trying to add it to a temporary set,
may result into an extremely poor performance, as most add operations
will be wasted on duplicated elements.

For this reason this implementation uses a different strategy in this
case: the Set is copied, and random elements are returned to reach the
specified count.

The code actually uses 4 different algorithms optimized for the
different cases.

If the count is negative, the command changes behavior and allows for
duplicated elements in the returned subset.

commit | commitdiff | tree

antirez [Mon, 17 Sep 2012 10:45:57 +0000 (12:45 +0200)]

Fix compilation on FreeBSD. Thanks to @koobs on twitter.

commit | commitdiff | tree

antirez [Mon, 17 Sep 2012 08:45:56 +0000 (10:45 +0200)]

.gitignore modified to be more general with less entries.

commit | commitdiff | tree

antirez [Tue, 4 Sep 2012 08:37:49 +0000 (10:37 +0200)]

A reimplementation of blocking operation internals.

Redis provides support for blocking operations such as BLPOP or BRPOP.
This operations are identical to normal LPOP and RPOP operations as long
as there are elements in the target list, but if the list is empty they
block waiting for new data to arrive to the list.

All the clients blocked waiting for th same list are served in a FIFO
way, so the first that blocked is the first to be served when there is
more data pushed by another client into the list.

The previous implementation of blocking operations was conceived to
serve clients in the context of push operations. For for instance:

1) There is a client "A" blocked on list "foo".
2) The client "B" performs `LPUSH foo somevalue`.
3) The client "A" is served in the context of the "B" LPUSH,
synchronously.

Processing things in a synchronous way was useful as if "A" pushes a
value that is served by "B", from the point of view of the database is a
NOP (no operation) thing, that is, nothing is replicated, nothing is
written in the AOF file, and so forth.

However later we implemented two things:

1) Variadic LPUSH that could add multiple values to a list in the
context of a single call.
2) BRPOPLPUSH that was a version of BRPOP that also provided a "PUSH"
side effect when receiving data.

This forced us to make the synchronous implementation more complex. If
client "B" is waiting for data, and "A" pushes three elemnents in a
single call, we needed to propagate an LPUSH with a missing argument
in the AOF and replication link. We also needed to make sure to
replicate the LPUSH side of BRPOPLPUSH, but only if in turn did not
happened to serve another blocking client into another list ;)

This were complex but with a few of mutually recursive functions
everything worked as expected... until one day we introduced scripting
in Redis.

Scripting + synchronous blocking operations = Issue #614.

Basically you can't "rewrite" a script to have just a partial effect on
the replicas and AOF file if the script happened to serve a few blocked
clients.

The solution to all this problems, implemented by this commit, is to
change the way we serve blocked clients. Instead of serving the blocked
clients synchronously, in the context of the command performing the PUSH
operation, it is now an asynchronous and iterative process:

1) If a key that has clients blocked waiting for data is the subject of
a list push operation, We simply mark keys as "ready" and put it into a
queue.
2) Every command pushing stuff on lists, as a variadic LPUSH, a script,
or whatever it is, is replicated verbatim without any rewriting.
3) Every time a Redis command, a MULTI/EXEC block, or a script,
completed its execution, we run the list of keys ready to serve blocked
clients (as more data arrived), and process this list serving the
blocked clients.
4) As a result of "3" maybe more keys are ready again for other clients
(as a result of BRPOPLPUSH we may have push operations), so we iterate
back to step "3" if it's needed.

The new code has a much simpler semantics, and a simpler to understand
implementation, with the disadvantage of not being able to "optmize out"
a PUSH+BPOP as a No OP.

This commit will be tested with care before the final merge, more tests
will be added likely.

commit | commitdiff | tree

antirez [Tue, 11 Sep 2012 08:32:04 +0000 (10:32 +0200)]

Make sure that SELECT argument is an integer or return an error.

Unfortunately we had still the lame atoi() without any error checking in
place, so "SELECT foo" would work as "SELECT 0". This was not an huge
problem per se but some people expected that DB can be strings and not
just numbers, and without errors you get the feeling that they can be
numbers, but not the behavior.

Now getLongFromObjectOrReply() is used as almost everybody else across
the code, generating an error if the number is not an integer or
overflows the long type.

Thanks to @mipearson for reporting that on Twitter.

commit | commitdiff | tree

antirez [Mon, 10 Sep 2012 10:42:55 +0000 (12:42 +0200)]

Match printf format with actual type in genRedisInfoString().

commit | commitdiff | tree

antirez [Wed, 5 Sep 2012 15:46:06 +0000 (17:46 +0200)]

BITCOUNT regression test for #582 fixed for 32 bit target.

Bug #582 was not present in 32 bit builds of Redis as
getObjectFromLong() will return an error for overflow.

This commit makes sure that the test does not fail because of the error
returned when running against 32 bit builds.

commit | commitdiff | tree

Haruto Otake [Sun, 15 Jul 2012 09:38:30 +0000 (18:38 +0900)]

BITCOUNT: fix segmentation fault.

remove unsafe and unnecessary cast.
until now, this cast may lead segmentation fault when end > UINT_MAX

setbit foo 0 1
bitcount 0 4294967295
=> ok
bitcount 0 4294967296
=> cause segmentation fault.

Note by @antirez: the commit was modified a bit to also change the
string length type to long, since it's guaranteed to be at max 512 MB in
size, so we can work with the same type across all the code path.

A regression test was also added.

commit | commitdiff | tree

Salvatore Sanfilippo [Wed, 5 Sep 2012 13:59:37 +0000 (06:59 -0700)]

Merge pull request #576 from saj/fix-slave-ping-period

Bug fix: slaves being pinged every second

commit | commitdiff | tree

antirez [Tue, 4 Sep 2012 23:12:41 +0000 (01:12 +0200)]

Scripting: Force SORT BY constant determinism inside SORT itself.

SORT is able to return (faster than when ordering) unordered output if
the "BY" clause is used with a constant value. However we try to play
well with scripting requirements of determinism providing always sorted
outputs when SORT (and other similar commands) are called by Lua
scripts.

However we used the general mechanism in place in scripting in order to
reorder SORT output, that is, if the command has the "S" flag set, the
Lua scripting engine will take an additional step when converting a
multi bulk reply to Lua value, calling a Lua sorting function.

This is suboptimal as we can do it faster inside SORT itself.
This is also broken as issue #545 shows us: basically when SORT is used
with a constant BY, and additionally also GET is used, the Lua scripting
engine was trying to order the output as a flat array, while it was
actually a list of key-value pairs.

What we do know is to recognized if the caller of SORT is the Lua client
(since we can check this using the REDIS_LUA_CLIENT flag). If so, and if
a "don't sort" condition is triggered by the BY option with a constant
string, we force the lexicographical sorting.

This commit fixes this bug and improves the performance, and at the same
time simplifies the implementation. This does not mean I'm smart today,
it means I was stupid when I committed the original implementation ;)

commit | commitdiff | tree

antirez [Tue, 4 Sep 2012 14:06:53 +0000 (16:06 +0200)]

Sentinel: reply -IDONTKNOW to get-master-addr-by-name on lack of info.

If we don't have any clue about a master since it never replied to INFO
so far, reply with an -IDONTKNOW error to SENTINEL
get-master-addr-by-name requests.

commit | commitdiff | tree

antirez [Tue, 4 Sep 2012 13:52:04 +0000 (15:52 +0200)]

Sentinel: more easy master redirection if master is a slave.

Before this commit Sentienl used to redirect master ip/addr if the
current instance reported to be a slave only if this was the first INFO
output received, and the role was found to be slave.

Now instead also if we find that the runid is different, and the
reported role is slave, we also redirect to the reported master ip/addr.

This unifies the behavior of Sentinel in the case of a reboot (where it
will see the first INFO output with the wrong role and will perform the
redirection), with the behavior of Sentinel in the case of a change in
what it sees in the INFO output of the master.

commit | commitdiff | tree

antirez [Fri, 31 Aug 2012 13:32:57 +0000 (15:32 +0200)]

Send an async PING before starting replication with master.

During the first synchronization step of the replication process, a Redis
slave connects with the master in a non blocking way. However once the
connection is established the replication continues sending the REPLCONF
command, and sometimes the AUTH command if needed. Those commands are
send in a partially blocking way (blocking with timeout in the order of
seconds).

Because it is common for a blocked master to accept connections even if
it is actually not able to reply to the slave requests, it was easy for
a slave to block if the master had serious issues, but was still able to
accept connections in the listening socket.

For this reason we now send an asynchronous PING request just after the
non blocking connection ended in a successful way, and wait for the
reply before to continue with the replication process. It is very
unlikely that a master replying to PING can't reply to the other
commands.

This solution was proposed by Didier Spezia (Thanks!) so that we don't
need to turn all the replication process into a non blocking affair, but
still the probability of a slave blocked is minimal even in the event of
a failing master.

Also we now use getsockopt(SO_ERROR) in order to check errors ASAP
in the event handler, instead of waiting for actual I/O to return an
error.

This commit fixes issue #632.

commit | commitdiff | tree

antirez [Fri, 31 Aug 2012 09:08:53 +0000 (11:08 +0200)]

Scripting: Reset Lua fake client reply_bytes after command execution.

Lua scripting uses a fake client in order to run commands in the context
of a client, accumulate the reply, and convert it into a Lua object
to return to the caller. This client is reused again and again, and is
referenced by the server.lua_client globally accessible pointer.

However after every call to redis.call() or redis.pcall(), that is
handled by the luaRedisGenericCommand() function, the reply_bytes field
of the client was not set back to zero. This filed is used to estimate
the amount of memory currently used in the reply. Because of the lack of
reset, script after script executed, this value used to get bigger and
bigger, and in the end on 32 bit systems it triggered the following
assert:

redisAssert(c->reply_bytes < ULONG_MAX-(1024*64));

On 64 bit systems this does not happen because it takes too much time to
reach values near to 2^64 for users to see the practical effect of the
bug.

Now in the cleanup stage of luaRedisGenericCommand() we reset the
reply_bytes counter to zero, avoiding the issue. It is not practical to
add a test for this bug, but the fix was manually tested using a
debugger.

This commit fixes issue #656.

commit | commitdiff | tree

antirez [Fri, 31 Aug 2012 08:22:21 +0000 (10:22 +0200)]

Scripting: require at least one argument for redis.call().

Redis used to crash with a call like the following:

EVAL "redis.call()" 0

Now the explicit check for at least one argument prevents the problem.

This commit fixes issue #655.

commit | commitdiff | tree

antirez [Thu, 30 Aug 2012 15:57:02 +0000 (17:57 +0200)]

Sentinel: do not crash against slaves not publishing the runid.

Older versions of Redis (before 2.4.17) don't publish the runid field in
INFO. This commit makes Sentinel able to handle that without crashing.

commit | commitdiff | tree

antirez [Wed, 29 Aug 2012 10:44:24 +0000 (12:44 +0200)]

Sentinel: INFO command implementation.

commit | commitdiff | tree

antirez [Wed, 29 Aug 2012 09:44:01 +0000 (11:44 +0200)]

Sentinel: add Redis execution mode to INFO output.

The new "redis_mode" field in the INFO output will show if Redis is
running in standalone mode, cluster, or sentinel mode.

commit | commitdiff | tree

antirez [Tue, 28 Aug 2012 15:53:18 +0000 (17:53 +0200)]

Sentinel: added documentation about slave-priority in redis.conf

commit | commitdiff | tree

antirez [Tue, 28 Aug 2012 15:45:01 +0000 (17:45 +0200)]

Sentinel: Sentinel-side support for slave priority.

The slave priority that is now published by Redis in INFO output is
now used by Sentinel in order to select the slave with minimum priority
for promotion, and in order to consider slaves with priority set to 0 as
not able to play the role of master (they will never be promoted by
Sentinel).

The "slave-priority" field is now one of the fileds that Sentinel
publishes when describing an instance via the SENTINEL commands such as
"SENTINEL slaves mastername".

commit | commitdiff | tree

antirez [Tue, 28 Aug 2012 15:20:26 +0000 (17:20 +0200)]

Sentinel: Redis-side support for slave priority.

A Redis slave can now be configured with a priority, that is an integer
number that is shown in INFO output and can be get and set using the
redis.conf file or the CONFIG GET/SET command.

This field is used by Sentinel during slave election. A slave with lower
priority is preferred. A slave with priority zero is never elected (and
is considered to be impossible to elect even if it is the only slave
available).

A next commit will add support in the Sentinel side as well.

commit | commitdiff | tree

antirez [Tue, 28 Aug 2012 10:56:05 +0000 (12:56 +0200)]

Sentinel: suppress harmless warning by initializing 'table' to NULL.

Note that the assertion guarantees that one of the if branches setting
table is always entered.

commit | commitdiff | tree

antirez [Fri, 24 Aug 2012 17:28:44 +0000 (19:28 +0200)]

Incrementally flush RDB on disk while loading it from a master.

This fixes issue #539.

Basically if there is enough free memory the OS may buffer the RDB file
that the slave transfers on disk from the master. The file may
actually be flused on disk at once by the operating system when it gets
closed by Redis, causing the close system call to block for a long time.

This patch is a modified version of one provided by yoav-steinberg of
@garantiadata (the original version was posted in the issue #539
comments), and tries to flush the OS buffers incrementally (every 8 MB
of loaded data).

commit | commitdiff | tree

antirez [Fri, 24 Aug 2012 13:40:22 +0000 (15:40 +0200)]

Fix a forget zmalloc_oom() -> zmalloc_oom_handler() replacement.

commit | commitdiff | tree

antirez [Fri, 24 Aug 2012 10:55:37 +0000 (12:55 +0200)]

Better Out of Memory handling.

The previous implementation of zmalloc.c was not able to handle out of
memory in an application-specific way. It just logged an error on
standard error, and aborted.

The result was that in the case of an actual out of memory in Redis
where malloc returned NULL (In Linux this actually happens under
specific overcommit policy settings and/or with no or little swap
configured) the error was not properly logged in the Redis log.

This commit fixes this problem, fixing issue #509.
Now the out of memory is properly reported in the Redis log and a stack
trace is generated.

The approach used is to provide a configurable out of memory handler
to zmalloc (otherwise the default one logging the event on the
standard output is used).

commit | commitdiff | tree

antirez [Fri, 24 Aug 2012 10:29:54 +0000 (12:29 +0200)]

Sentinel: send SCRIPT KILL on -BUSY reply and SDOWN instance.

From the point of view of Redis an instance replying -BUSY is down,
since it is effectively not able to reply to user requests. However
a looping script is a recoverable condition in Redis if the script still
did not performed any write to the dataset. In that case performing a
fail over is not optimal, so Sentinel now tries to restore the normal server
condition killing the script with a SCRIPT KILL command.

If the script already performed some write before entering an infinite
(or long enough to timeout) loop, SCRIPT KILL will not work and the
fail over will be triggered anyway.

commit | commitdiff | tree

antirez [Fri, 24 Aug 2012 10:10:24 +0000 (12:10 +0200)]

Sentinel: fixed a crash on script execution.

The call to sentinelScheduleScriptExecution() lacked the final NULL
argument to signal the end of arguments. This resulted into a crash.

commit | commitdiff | tree

Salvatore Sanfilippo [Wed, 22 Aug 2012 09:32:27 +0000 (02:32 -0700)]

Merge pull request #628 from pietern/unstable-zip

Fix ziplist edge case

commit | commitdiff | tree

antirez [Tue, 21 Aug 2012 15:31:44 +0000 (17:31 +0200)]

redis-benchmark: disable big buffer cleanup in hiredis context.

This new hiredis features allows us to reuse a previous context reader
buffer even if already very big in order to maximize performances with
big payloads (Usually hiredis re-creates buffers when they are too big
and unused in order to save memory).

commit | commitdiff | tree

antirez [Tue, 21 Aug 2012 15:27:01 +0000 (17:27 +0200)]

hiredis library updated.

This version of hiredis merges modifications of the Redis fork with
latest changes in the hiredis repository.

The same version was pushed on the hiredis repository and will probably
merged into the master branch in short time.

commit | commitdiff | tree

Pieter Noordhuis [Mon, 13 Aug 2012 19:39:49 +0000 (12:39 -0700)]

Set p to its new offset before modifying it

commit | commitdiff | tree

Pieter Noordhuis [Mon, 13 Aug 2012 21:09:40 +0000 (14:09 -0700)]

Add ziplist test for deleting next to last entries

commit | commitdiff | tree

antirez [Fri, 3 Aug 2012 10:39:13 +0000 (12:39 +0200)]

Sentinel: SENTINEL FAILOVER command implemented.

This command can be used in order to force a Sentinel instance to start
a failover for the specified master, as leader, forcing the failover
even if the master is up.

The commit also adds some minor refactoring and other improvements to
functions already implemented that make them able to work when the
master is not in SDOWN condition. For instance slave selection
assumed that we ask INFO every second to every slave, this is true
only when the master is in SDOWN condition, so slave selection did not
worked when the master was not in SDOWN condition.

Redis += OpenLDAP MDB

RSS Atom