Redis Best Practices

Bloom Filter Pattern

Bloom filters are an interesting probabilistic data structure that have can be used to see if an item has never been added previously. The curious wording here is intentional. It is probabilistic in that it is possible to have false positives but not false negatives. Bloom filters provide a much more compact and faster way of getting a quick presence check than storing all items in a set and call SISMEMBER.

Bloom filters work by running an item through a quick hashing function and sampling bits from that hash and setting them from a 0 to 1 at particular interval in a bitfield. To check for existence in a Bloom filter, the same bits are sampled. Many item may have bits that overlap, but since a hashing function produce unique identifiers, if a single bit from the hash is still a 0, then we know it has not been previously added.

Bloom filters have been used with Redis for many years via client side libraries that leveraged GETBIT and SETBIT to work with a bitfield at a key. Thankfully, since Redis 4.0, the ReBloom module has been available which takes away any Bloom filter implementation overhead.

A good use case for a Bloom filter is to check for an already used username. On a small scale, this is no problem, but as a service grows, this can be very taxing on a database. It is very simple to implement this with a ReBloom.

First, let’s add a handful of usernames as a test:

> BF.ADD usernames funnyfred
(integer) 1
> BF.ADD usernames fredisfunny
(integer) 1
> BF.ADD usernames fred
(integer) 1
> BF.ADD usernames funfred
(integer) 1

Now, let’s run some test versus the Bloom filter.

> BF.EXISTS usernames fred
(integer) 1
> BF.EXISTS usernames fred_is_funny
(integer) 0

As expected, fred_is_funny yields a 0. A response of zero means we can be sure that this username has not been used. A response of 1 means it might have been used. We can’t say for certain as it might a case of overlapping bits between multiple items.

Generally, the chances of false positives are low, but non-zero. As the Bloom filter “fills up” the chances increase. You can tweak the error rate and size with the BF.RESERVE command.