Basic API Rate-Limiting

It is likely that you are developing some form of (web/RESTful) API, and in case it is publicly-facing (or even when it’s internal), you normally want to rate-limit it somehow. That is, to limit the number of requests performed over a period of time, in order to save resources and protect from abuse.

This can probably be achieved on web-server/load balancer level with some clever configurations, but usually you want the rate limiter to be client-specific (i.e. each client of your API sohuld have a separate rate limit), and the way the client is identified varies. It’s probably still possible to do it on the load balancer, but I think it makes sense to have it on the application level.

I’ll use spring-mvc for the example, but any web framework has a good way to plug an interceptor.

So here’s an example of a spring-mvc interceptor:

@Component
public class RateLimitingInterceptor extends HandlerInterceptorAdapter {

    private static final Logger logger = LoggerFactory.getLogger(RateLimitingInterceptor.class);
    
    @Value("${rate.limit.enabled}")
    private boolean enabled;
    
    @Value("${rate.limit.hourly.limit}")
    private int hourlyLimit;

    private ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(10);

    private Map<String, Optional<SimpleRateLimiter>> limiters = new ConcurrentHashMap<>();
    
    @Override
    public boolean preHandle(HttpServletRequest request, HttpServletResponse response, Object handler)
            throws Exception {
        if (!enabled) {
            return true;
        }
        String clientId = request.getHeader("Client-Id");
        // let non-API requests pass
        if (clientId == null) {
            return true;
        }
        SimpleRateLimiter rateLimiter = getRateLimiter(clientId);
        boolean allowRequest = limiter.tryAcquire();
    
        if (!allowRequest) {
            response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
        }
        response.addHeader("X-RateLimit-Limit", String.valueOf(hourlyLimit));
        return allowRequest;
    }
    
    private SimpleRateLimiter getRateLimiter(String clientId) {
        return limiters.computeIfAbsent(clientId, clientId -> {
            return Optional.of(createRateLimiter(clientId));
        });
    }

    private SimpleRateLimiter createRateLimiter(String applicationId) {
        logger.info("Creating rate limiter for applicationId={}", applicationId);
        return SimpleRateLimiter.create(hourlyLimit, TimeUnit.HOURS, scheduler, applicationId);
    }

	
    @PreDestroy
    public void destroy() {
        // loop and finalize all limiters
        scheduler.shutdown();
    }
}

This initializes rate-limiters per client on demand. Alternatively, on startup you could just loop through all registered API clients and create a rate limiter for each. In case the rate limiter doesn’t allow more requests (tryAcquire() returns false), then raturn “Too many requests” and abort the execution of the request (return “false” from the interceptor).

This sounds simple. But there are a few catches. You may wonder where the SimpleRateLimiter above is defined. We’ll get there, but first let’s see what options do we have for rate limiter implementations.

The most recommended one seems to be the guava RateLimiter. It has a straightforward factory method that gives you a rate limiter for a specified rate (permits per second). However, it doesn’t accomodate web APIs very well, as you can’t initilize the RateLimiter with pre-existing number of permits. That means a period of time should elapse before the limiter would allow requests. There’s another issue – if you have less than one permits per second (e.g. if your desired rate limit is “200 requests per hour”), you can pass a fraction (hourlyLimit / secondsInHour), but it still won’t work the way you expect it to, as internally there’s a “maxPermits” field that would cap the number of permits to much less than you want it to. Also, the rate limiter doesn’t allow bursts – you have exactly X permits per second, but you cannot spread them over a long period of time, e.g. have 5 requests in one second, and then no requests for the next few seconds. In fact, all of the above can be solved, but sadly, through hidden fields that you don’t have access to. Multiple feature requests exist for years now, but Guava just doesn’t update the rate limiter, making it much less applicable to API rate-limiting.

Using reflection, you can tweak the parameters and make the limiter work. However, it’s ugly, and it’s not guaranteed it will work as expected. I have shown here how to initialize a guava rate limiter with X permits per hour, with burstability and full initial permits. When I thought that would do, I saw that tryAcquire() has a synchronized(..) block. Will that mean all requests will wait for each other when simply checking whether allowed to make a request? That would be horrible.

So in fact the guava RateLimiter is not meant for (web) API rate-limiting. Maybe keeping it feature-poor is Guava’s way for discouraging people from misusing it?

That’s why I decided to implement something simple myself, based on a Java Semaphore. Here’s the naive implementation:

public class SimpleRateLimiter {
    private Semaphore semaphore;
    private int maxPermits;
    private TimeUnit timePeriod;
    private ScheduledExecutorService scheduler;

    public static SimpleRateLimiter create(int permits, TimeUnit timePeriod) {
        SimpleRateLimiter limiter = new SimpleRateLimiter(permits, timePeriod);
        limiter.schedulePermitReplenishment();
        return limiter;
    }

    private SimpleRateLimiter(int permits, TimeUnit timePeriod) {
        this.semaphore = new Semaphore(permits);
        this.maxPermits = permits;
        this.timePeriod = timePeriod;
    }

    public boolean tryAcquire() {
        return semaphore.tryAcquire();
    }

    public void stop() {
        scheduler.shutdownNow();
    }

    public void schedulePermitReplenishment() {
        scheduler = Executors.newScheduledThreadPool(1);
        scheduler.scheduleAtFixedRate(() -> {
            semaphore.release(maxPermits - semaphore.availablePermits());
        }, 1, 1, timePeriod);

    }
}

It takes a number of permits (allowed number of requests) and a time period. The time period is “1 X”, where X can be second/minute/hour/daily – depending on how you want your limit to be configured – per second, per minute, hourly, daily. Every 1 X a scheduler replenishes the acquired permits (in the example above there’s one scheduler per client, which may be inefficient with large number of clients – you can pass a shared scheduler pool instead). There is no control for bursts (a client can spend all permits with a rapid succession of requests), there is no warm-up functionality, there is no gradual replenishment. Depending on what you want, this may not be ideal, but that’s just a basic rate limiter that is thread-safe and doesn’t have any blocking. I wrote a unit test to confirm that the limiter behaves properly, and also ran performance tests against a local application to make sure the limit is obeyed. So far it seems to be working.

Are there alternatives? Well, yes – there are libraries like RateLimitJ that uses Redis to implement rate-limiting. That would mean, however, that you need to setup and run Redis. Which seems like an overhead for “simply” having rate-limiting. (Note: it seems to also have an in-memory version)

On the other hand, how would rate-limiting work properly in a cluster of application nodes? Application nodes probably need some database or gossip protocol to share data about the per-client permits (requests) remaining? Not necessarily. A very simple approach to this issue would be to assume that the load balancer distributes the load equally among your nodes. That way you would just have to set the limit on each node to be equal to the total limit divided by the number of nodes. It won’t be exact, but you rarely need it to be – allowing 5-10 more requests won’t kill your application, allowing 5-10 less won’t be dramatic for the users.

That, however, would mean that you have to know the number of application nodes. If you employ auto-scaling (e.g. in AWS), the number of nodes may change depending on the load. If that is the case, instead of configuring a hard-coded number of permits, the replenishing scheduled job can calculate the “maxPermits” on the fly, by calling an AWS (or other cloud-provider) API to obtain the number of nodes in the current auto-scaling group. That would still be simpler than supporting a redis deployment just for that.

Overall, I’m surprised there isn’t a “canonical” way to implement rate-limiting (in Java). Maybe the need for rate-limiting is not as common as it may seem. Or it’s implemented manually – by temporarily banning API clients that use “too much resources”.

Update: someone pointed out the bucket4j project, which seems nice and worth taking a look at.

17 thoughts on “Basic API Rate-Limiting”

  1. I was also surprised to find that Java didn’t provide a robust library for rate limiting which I think is a growing problem as the number of mission critical APIs increases.

    Thanks for the link to RateLimitJ. I would like to point out that RateLimitJ has an in-memory implementation for systems not concerned with exact counts across a distributed system, see https://github.com/mokies/ratelimitj/tree/master/ratelimitj-inmemory. However, there are use cases where the limit needs to be low, such as password attempts per account where inexact limit counts would be problematic.

    Cheers.

  2. Thanks for sharing the code.

    I just happened to implement rate-limiting function recently, this’s really a nice tip. 🙂

  3. Thanks.

    I took a look at Guava RateLimiter implementation, however this is a great post.

    I will look at the RateLimitJ project.

    Thanks.

  4. Hi Bozho,

    Another option, if you have full control of the stack, would be to use a service like CloudFlare to implement rate limiting.

    Hope you’re well.

    Jon

  5. @mokies thanks, added a note

    @Jon hey 🙂 CloudFlare is an interesting option. Does it have the flexibility for dynamic/per-client configurations?

  6. It is ultimately better to replace String#Intern + “synchronized” statement by computeIfAbsent in following way:
    private Function limiterSupplier = this::createRateLimiter;

    private SimpleRateLimiter getRateLimiter(String clientId {
    limiters.computeIfAbsent(clientId, limiterSupplier);
    }
    As result:
    1. Code became smaller and simpler.
    2. JVM string pool is not growing up.

  7. Of course I understand that SimpleRateLimiter is just a mockup, but creation one Thread per each IP adress is really overkill, it is has no excuses:
    1. 5000 different clients will utilize 5000 OS threads, most likely JVM will be crashed by OOM killer.
    2. Tasks placed to executors consume CPU for execution even if client does not use service. Imaging client called service on time a week ago and do not use service after, unfortunately service continue to spent its CPU resources for managing limits associated with this client.
    3. To do refill of available permits, You do not need any background work AT ALL, just do work responsible for refill in time of acquireing, just follow this example https://github.com/vladimir-bukhtoyarov/bucket4j/blob/2.1/doc-pages/token-bucket-brief-overview.md#example-of-basic-java-token-bucket-implementation

  8. @Vladimir yes, the computeIfAbsent is better. Will implement it that way

    But note that threads are NOT per IP addresss. It is per API client. Depending on the number of customers, that can be a very practical solution. Of course, it’s better to do it in a better way, but it’s not such an overkill for even medium sized applications.

    I’ll try to make it clearer that clientId is not an IP, but an API client identifier

  9. @Vladimir

    in the code there is some issue.

    this.refillTokensPerOneMillis = (double) refillPeriodMillis / (double) refillPeriodMillis;

    that would always be 1.

    Also, if (currentTimeMillis > lastRefillTimestamp) { – this means that every request (apart form requests in the same second) will trigger the code, right?

  10. Rate limiting is one of the things for which I prefer to make use of an API gateway in front of the application. Although, I think it would be nice to design an application code solution that wouldn’t imply maintaining state on the back-end. Any ideas around this?

  11. @Bozho: Thanks a lot for this insightful post, really helped me in understanding basics about Rate limiting.
    Just a nitpick, in method “createRateLimiter” you are calling
    “SimpleRateLimiter.create(hourlyLimit, TimeUnit.HOURS, scheduler, applicationId);” but in SimpleRateLimiter class there is no such method which takes 4 arguments.

  12. Isn’t there a race condition here?

    scheduler.scheduleAtFixedRate(() -> {
    semaphore.release(maxPermits – semaphore.availablePermits());
    }, 1, 1, timePeriod);

    a thread could acquire a permit between you calling availablePermits() and semaphore.release(). Then you’d be releasing the wrong number of permits. This might result in you actually allowing fewer permits to be released than you should. This may not be fatal, as when the race condition hopefully doesn’t occur on next release cycle, the available count would go back to the max.

Leave a Reply

Your email address will not be published. Required fields are marked *