Rate Limiting Algorithms

The common rate-limiting algorithms, what each one trades off, and the race conditions you have to design around.

distributed-systemsrate-limitingalgorithmsredis

Jan 12, 2026 · 18 min

Why algorithms matter

A rate limiter answers one question on every request: has the caller exceeded their limit? The counting itself is simple. Doing it across distributed processes, surviving concurrent writes, and staying under a millisecond of latency is where it gets interesting.

The algorithm you pick decides:

How much memory each active client costs you
Whether traffic bursts are allowed (and by how much)
How accurately the limit is enforced
How complex the atomic operation has to be

There’s no universally correct choice. Pick the one that matches the traffic shape you actually have. Five algorithms show up over and over, roughly in order of implementation complexity:

Fixed window counter
Sliding window log
Sliding window counter
Token bucket
Leaky bucket

For each, we walk through the concept, a Jedis implementation in Java, the race condition the naive version has, and the Lua script that fixes it. The state store is Redis throughout. It’s the standard choice for rate limiting: in-memory speed, atomic primitives, built-in key expiration.

Fixed window counter

Concept. Divide time into fixed-size windows aligned to wall clock (e.g. one-minute windows). Keep a counter per (client, window). Each request increments the counter; reject if it exceeds the limit. Reset happens automatically via Redis key TTL.

The mental model: one bucket per window, you tally every request that arrives in the current bucket.

boolean allow(String clientId, Jedis jedis) {
    long window = System.currentTimeMillis() / WINDOW_MS;
    String key = "rl:" + clientId + ":" + window;

    long count = jedis.incr(key);
    if (count == 1) {
        jedis.expire(key, (int) (WINDOW_MS / 1000));
    }

    return count <= LIMIT;
}

The race. INCR and EXPIRE are two separate Redis commands. If the process dies between them the key never gets a TTL and stays in Redis forever, a slow memory leak. Under high concurrency, two requests can both INCR a fresh key, both see count == 1, and both call EXPIRE (harmless but wasteful).

The fix is a Lua script. Redis executes Lua atomically against a single key, so the entire INCR + EXPIRE + return happens as one unit:

-- KEYS[1]: rate-limit key
-- ARGV[1]: limit
-- ARGV[2]: window TTL in seconds
local count = redis.call('INCR', KEYS[1])
if count == 1 then
    redis.call('EXPIRE', KEYS[1], ARGV[2])
end
return count <= tonumber(ARGV[1]) and 1 or 0

Called from Java:

boolean allow(String clientId, Jedis jedis) {
    long window = System.currentTimeMillis() / WINDOW_MS;
    String key = "rl:" + clientId + ":" + window;

    Long result = (Long) jedis.eval(
        FIXED_WINDOW_SCRIPT, 1, key,
        String.valueOf(LIMIT),
        String.valueOf(WINDOW_MS / 1000)
    );
    return result == 1L;
}

Trade-offs. Memory-cheap (one counter + TTL per active client), trivial to reason about. The flaw is window boundaries: a client can fire LIMIT requests in the last second of one window and LIMIT more in the first second of the next. That’s 2 × LIMIT in two seconds, which a strict reading of “N per minute” wouldn’t allow.

Use when simplicity matters more than precision and occasional 2× bursts at window boundaries are tolerable.

Sliding window log

Concept. Drop the windowing entirely. Keep a log of every request’s timestamp. On each request: drop entries older than now - window, count what’s left, accept if count is below limit and append the new timestamp.

This is the most accurate algorithm. No windows to game, no approximation. The “window” slides continuously with now.

Memory model: one entry per request currently in the window. A 100 req/min limit means up to 100 entries per active client at any given moment. Across millions of clients, memory grows linearly with traffic volume.

boolean allow(String clientId, Jedis jedis) {
    long now = System.currentTimeMillis();
    String key = "rl:" + clientId;

    jedis.zremrangeByScore(key, 0, now - WINDOW_MS);
    long count = jedis.zcard(key);
    if (count >= LIMIT) return false;

    jedis.zadd(key, now, String.valueOf(now));
    jedis.expire(key, (int) (WINDOW_MS / 1000));
    return true;
}

The state is a Redis sorted set (ZSET) where the score is the timestamp. ZREMRANGEBYSCORE does a range-delete on old entries, ZCARD counts what’s left, ZADD records the new request with the current timestamp as both score and value.

The race. Between ZCARD reading the count and ZADD writing the new entry, another request can interleave. Two requests both observe count == LIMIT - 1, both decide they can proceed, both ZADD, and the count ends up at LIMIT + 1. For a strict guarantee, the read-then-write has to be atomic.

Lua:

-- KEYS[1]: rate-limit key
-- ARGV[1]: now (ms)
-- ARGV[2]: window (ms)
-- ARGV[3]: limit
local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])

redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, now - window)
local count = redis.call('ZCARD', KEYS[1])
if count >= limit then return 0 end

redis.call('ZADD', KEYS[1], now, now)
redis.call('EXPIRE', KEYS[1], math.ceil(window / 1000))
return 1

Trade-offs. Perfect accuracy. Memory cost grows with traffic, which is fine for low-volume, high-stakes operations but expensive at scale.

Use when accuracy is critical and traffic per client is bounded: login attempts, password resets, expensive write operations.

Sliding window counter

Concept. A middle ground between fixed window’s cheapness and sliding window log’s accuracy. Keep counters for the current and previous fixed windows. On each request, compute a weighted count:

weighted = current_count + previous_count × overlap_ratio

where overlap_ratio is how much of the previous window is still inside the rolling now - window interval.

Worked example. Window size 60s, limit 100. We’re 30s into the current window:

Current window count: 50
Previous window count: 80
Overlap ratio: 1 - (30 / 60) = 0.5 (half of the previous window still falls within now - 60s)
Weighted count: 50 + (80 × 0.5) = 90 → allowed (90 < 100)

The approximation assumes the previous window’s requests were evenly distributed in time. In practice the error is small (typically under 1%) and the memory savings are huge.

boolean allow(String clientId, Jedis jedis) {
    long now = System.currentTimeMillis();
    long cur = now / WINDOW_MS;
    long prev = cur - 1;
    double overlap = 1.0 - (now % WINDOW_MS) / (double) WINDOW_MS;

    String curKey = "rl:" + clientId + ":" + cur;
    String prevKey = "rl:" + clientId + ":" + prev;

    String curStr = jedis.get(curKey);
    String prevStr = jedis.get(prevKey);
    long curCount = curStr != null ? Long.parseLong(curStr) : 0;
    long prevCount = prevStr != null ? Long.parseLong(prevStr) : 0;

    if (curCount + prevCount * overlap >= LIMIT) return false;

    long newCount = jedis.incr(curKey);
    if (newCount == 1) {
        jedis.expire(curKey, (int) ((WINDOW_MS * 2) / 1000));
    }
    return true;
}

Note the TTL: WINDOW_MS * 2. The current window’s counter needs to outlive the window itself, because the next window will look it up as the previous-window count.

The race. Same shape as everywhere else: read-then-write across multiple commands isn’t atomic. Lua:

local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local cur = math.floor(now / window)
local prev = cur - 1
local overlap = 1 - (now % window) / window

local curKey = KEYS[1] .. ':' .. cur
local prevKey = KEYS[1] .. ':' .. prev

local curCount = tonumber(redis.call('GET', curKey)) or 0
local prevCount = tonumber(redis.call('GET', prevKey)) or 0

if curCount + prevCount * overlap >= limit then return 0 end

local newCount = redis.call('INCR', curKey)
if newCount == 1 then
    redis.call('EXPIRE', curKey, math.ceil(window * 2 / 1000))
end
return 1

Trade-offs. Two counters per active client (current + previous window). Smooth approximation that captures the “rolling” behavior of sliding window log without its linear memory cost. The approximation breaks when traffic is wildly uneven within a window, but for most workloads the error stays under 1%.

Use when you want sliding-window-feel accuracy at fixed-window cost. Cloudflare uses this exact algorithm for its public rate limiter.

Token bucket

Concept. Each client owns a bucket of CAPACITY tokens. Tokens refill at a fixed REFILL_RATE (tokens per second). Each request consumes one token; if the bucket is empty, reject. Refilling is computed lazily on every request from elapsed time. No background timer.

Per-client state: two numbers. tokens is the current balance; lastRefill is when we last computed it.

The crucial property is burst behavior. If a bucket sits idle, tokens accumulate up to CAPACITY. A client that hasn’t called for a minute can spend the full capacity in one go. Active clients are pinned to the long-term refill rate. This matches how real traffic actually looks: idle periods between bursts.

boolean allow(String clientId, Jedis jedis) {
    long now = System.currentTimeMillis();
    String key = "rl:" + clientId;

    Map<String, String> state = jedis.hgetAll(key);
    long lastRefill = state.containsKey("lastRefill")
        ? Long.parseLong(state.get("lastRefill"))
        : now;
    double tokens = state.containsKey("tokens")
        ? Double.parseDouble(state.get("tokens"))
        : CAPACITY;

    double elapsed = (now - lastRefill) / 1000.0;
    tokens = Math.min(CAPACITY, tokens + elapsed * REFILL_RATE);

    if (tokens < 1) return false;

    Map<String, String> update = new HashMap<>();
    update.put("tokens", String.valueOf(tokens - 1));
    update.put("lastRefill", String.valueOf(now));
    jedis.hset(key, update);
    jedis.expire(key, (int) (CAPACITY / REFILL_RATE + 60));

    return true;
}

State is a Redis hash. TTL is set to roughly the time it takes to refill a full bucket plus a buffer, long enough that we don’t lose state for slow clients.

The race. Token bucket’s race is more dangerous than fixed window’s because every request both reads AND writes state. Two concurrent requests can both read tokens == 1, both decide they can spend their token, both write back tokens == 0. The bucket was only supposed to allow one. Atomicity is essential here, not optional.

Lua:

-- KEYS[1]: rate-limit key
-- ARGV[1]: now (ms)
-- ARGV[2]: capacity
-- ARGV[3]: refill rate (tokens per second)
local now = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local refillRate = tonumber(ARGV[3])

local state = redis.call('HMGET', KEYS[1], 'tokens', 'lastRefill')
local tokens = tonumber(state[1]) or capacity
local lastRefill = tonumber(state[2]) or now

local elapsed = (now - lastRefill) / 1000
tokens = math.min(capacity, tokens + elapsed * refillRate)

if tokens < 1 then return 0 end

redis.call('HMSET', KEYS[1],
    'tokens', tokens - 1,
    'lastRefill', now
)
redis.call('EXPIRE', KEYS[1], math.ceil(capacity / refillRate + 60))
return 1

Trade-offs. Memory-cheap (two values per active client). Allows controlled bursts via two independent knobs:

REFILL_RATE is the sustained rate (e.g. 10 req/s)
CAPACITY is the burst size (e.g. 100 tokens, a 10-second burst’s worth)

You can dial each independently to match your service’s tolerance. AWS API Gateway, Stripe’s API rate limiter, and most production API gateways converge on token bucket for this reason.

Use when traffic has natural bursts and you want controlled headroom for those bursts while still enforcing a sustained rate.

Leaky bucket

Concept. Model the rate limiter as a bucket with a hole. Requests fill the bucket; the bucket drains at a constant OUTFLOW_RATE. If a request would overflow CAPACITY, reject it.

Two variants exist:

Leaky bucket as a meter. Track only the bucket fill level, drain it mathematically over time. Functionally similar to token bucket but inverted.
Leaky bucket as a queue. Physically queue requests and dequeue at a constant rate. Used in network traffic shaping, rarely in API rate limiting.

The meter variant is the relevant one for API rate limiting. It’s what you’d actually implement.

boolean allow(String clientId, Jedis jedis) {
    long now = System.currentTimeMillis();
    String key = "rl:" + clientId;

    Map<String, String> state = jedis.hgetAll(key);
    long lastDrain = state.containsKey("lastDrain")
        ? Long.parseLong(state.get("lastDrain"))
        : now;
    double level = state.containsKey("level")
        ? Double.parseDouble(state.get("level"))
        : 0;

    double drained = (now - lastDrain) * OUTFLOW_RATE / 1000.0;
    level = Math.max(0, level - drained);

    if (level >= CAPACITY) return false;

    Map<String, String> update = new HashMap<>();
    update.put("level", String.valueOf(level + 1));
    update.put("lastDrain", String.valueOf(now));
    jedis.hset(key, update);
    jedis.expire(key, (int) (CAPACITY / OUTFLOW_RATE + 60));

    return true;
}

Same race as token bucket: read-then-write on shared state. Lua:

-- KEYS[1]: rate-limit key
-- ARGV[1]: now (ms)
-- ARGV[2]: capacity
-- ARGV[3]: outflow rate (units per second)
local now = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local outflowRate = tonumber(ARGV[3])

local state = redis.call('HMGET', KEYS[1], 'level', 'lastDrain')
local level = tonumber(state[1]) or 0
local lastDrain = tonumber(state[2]) or now

local drained = (now - lastDrain) * outflowRate / 1000
level = math.max(0, level - drained)

if level >= capacity then return 0 end

redis.call('HMSET', KEYS[1],
    'level', level + 1,
    'lastDrain', now
)
redis.call('EXPIRE', KEYS[1], math.ceil(capacity / outflowRate + 60))
return 1

Trade-offs. Functionally similar to token bucket: same state shape, same race, same Lua pattern. The mental model is the meaningful difference:

Token bucket says “you have a budget that refills over time.” It emphasises burst tolerance for the client.
Leaky bucket says “your usage drains at a constant rate.” It emphasises smoothing the load on the downstream system.

If you’re protecting an expensive downstream service that can’t handle spikes, leaky bucket gives a stronger guarantee: the maximum load passing through is exactly OUTFLOW_RATE, regardless of how the client behaves.

Use when the downstream system can’t tolerate bursts. Billing APIs that are expensive to scale, third-party services with strict rate limits, anywhere smoothing matters more than letting bursts through.

Comparison and choosing

Algorithm	Memory	Accuracy	Bursts	Complexity
Fixed window counter	low	low (window-edge bursts)	poor	low
Sliding window log	high (linear with traffic)	perfect	excellent	medium
Sliding window counter	low (2 counters)	very good	good	medium
Token bucket	low (count + timestamp)	medium	excellent (configurable)	low
Leaky bucket (meter)	low (level + timestamp)	medium	smoothed	low

Choosing in practice:

Accuracy is critical, traffic is bounded. Sliding window log. The linear memory cost is acceptable for low-volume, high-stakes operations.
You want smooth, accurate-enough behavior at scale. Sliding window counter. The ~1% approximation error is invisible; memory stays constant.
You want to allow legitimate bursts. Token bucket. Tunable burst capacity and sustained rate as independent knobs. The most common choice for API rate limiting.
You’re protecting a downstream system from spikes. Leaky bucket. Drain rate becomes a hard ceiling on downstream load.
You don’t care about edge cases. Fixed window counter. Cheapest and simplest.

For how these algorithms fit into a system-design view (placement, scaling to 1M RPS, hot keys, HA, dynamic configuration), see Designing a Distributed Rate Limiter.

Production realities

Three things change once you’re running this at real scale.

Lua scripts are non-negotiable

Every algorithm above has a read-then-write pattern in its Jedis form, which means it has a race under concurrent load. At low traffic it’s a theoretical concern; at 1M req/s it materialises constantly. Lua scripts make the operation atomic against a single key. They’re not an optimisation. They’re the correct implementation.

Cache the scripts, call them by hash

EVAL ships the script source over the wire on every call. Wasteful when you’re invoking the same script millions of times. The two-command pattern real services use:

SCRIPT LOAD <source> once at startup. Redis stores the compiled bytecode in an internal cache and returns a SHA-1 hash that identifies it.
EVALSHA <sha> for every call. Only the 40-byte hash travels with each request; the bytecode is already on the server.

The SHA behaves like an opaque function name: load once, call many times. In Jedis:

private static final String SCRIPT = """
    local count = redis.call('INCR', KEYS[1])
    if count == 1 then
        redis.call('EXPIRE', KEYS[1], ARGV[2])
    end
    return count <= tonumber(ARGV[1]) and 1 or 0
    """;

private volatile String sha;

void init(Jedis jedis) {
    sha = jedis.scriptLoad(SCRIPT);
}

boolean allow(String clientId, Jedis jedis) {
    long window = System.currentTimeMillis() / WINDOW_MS;
    String key = "rl:" + clientId + ":" + window;
    List<String> keys = List.of(key);
    List<String> args = List.of(
        String.valueOf(LIMIT),
        String.valueOf(WINDOW_MS / 1000)
    );

    try {
        return (Long) jedis.evalsha(sha, keys, args) == 1L;
    } catch (JedisNoScriptException e) {
        // Cache evicted (Redis restart, SCRIPT FLUSH). Reload and retry.
        sha = jedis.scriptLoad(SCRIPT);
        return (Long) jedis.evalsha(sha, keys, args) == 1L;
    }
}

KEYS and ARGV are populated from the two list arguments. Redis splits them this way so a sharded cluster can route the call to the shard owning the keys. The NOSCRIPT fallback covers the rare case where the cache was wiped (a Redis restart, or someone ran SCRIPT FLUSH). Higher-level clients like Spring Data Redis and Lettuce wrap this fallback automatically.

References