Design a Rate Limiter.
Key: Italicized lines are for your understanding, they may not be part of the actual interview discussion. Also, the design is far from perfect but I have tried to capture things that you can realistically cover in a 25-35 mins interview.
Rate Limiter: A rate limiter in HTTP world is used to limit the number of client requests allowed to be processed over a specified period. Here are few examples:
- In a banking system you may request OTP (One Time Password) once every 5 minutes.
- In a social media platform, a user maybe allowed to post maximum 5 posts per minute
- You can register a new user from same IP maximum 5 times per day. This is case where we want to prevent bot/sock-puppet accounts
Benefits of having a Rate Limiter:
- Prevents DDoS/DoS (Denial of Service) attacks. DDoS attacks can swamp the system and cause resource starvation leading to massive downtimes. A proper and robust rate limiter can help reduce chances of DDoS attack being successful
- Cost benefit: If you have a rate limiting system in place, you can control maximum requests that your system is allowed to process. This is turn can be used to allocate resources in other places rather than scaling servers to handle ever increasing request per second
In general a system design interview goes as below
- Gather scope requirements - High level idea on what is the intent, scope, scale of system. Standalone vs BeSpoke service etc. Distributed vs non distributed. This is important because it helps us design keeping the scalability aspect in mind
- Functional and Non functional requirements: Functional requirements pertains to the APIs of your system and Non Functional refers to the performance aspect such as availability, low latency etc
- Basic design - start with a simple High level diagram (a few boxes is fine) explaining the flow and various helper services you can think of. Here it is important to ask interview as to where they want you to deep dive. For example, if you are desigining a ride-share service (Uber) then it depends on what interviewer wants to focus on, maybe they are interested in driver pricing optimisation or they want to improve the map search portion. Clarify whenever you are uncertain. System deisgn interviews are DISCUSSIONS with NO PERFECT SOLUTION.
- Dive deeper as per instructions and expand your design, it is a good practice to highlight possible shortcomings/bottlenecks in your design while you are making them.
- Validation: validate that your design works, address possible bottlenecks and seek interviewer feedback
- Future scope: such as a metric service to measure performance of your design; whether your design will work with 1 million customers, 100 million customers etc. What may work for 1 million users may not scale for 100 million.
- closing points : aligns with point 6 but can be a general commentary on your design and you may also mention other solutions briefly
Now let us proceed to the actual solution portion:
We can ask a few questions such as, If the rate limiter is to be a server side/client side?, Whether we should inform users/clients who are rate limited?, What is the expected scale of the system. How much throughput are we looking forward to handle? and Whether rate limiting is client specific or system specific, like if we would use IP, userId etc to enforce the rate limiting logic
Based on the above I am assuming below as my functional requirements:
- Server side rate limiting
- Should be able to handle large number of requests
- Clients should be informed of being rate limited (proper exception handling)
- For now, let us assume rate limiter is a separate service
- should have clearly defined APIs such as
allowRequest() that would return a boolean signifying if request is to be processed or dropped
Non-functional requirements
- Highly available service (since our system is a standalone application that will be used by various other services, it should be highly available)
- Low latency: The rate limiting should not impact the application's response time. Our logic should be efficient and optimized
- Fault tolerant: if rate limiter is down, it should not cause failures on the client applications. Failures needs to be handled
- Resource optimisation: Rate limiter should not be too expensive to implement
A very simple design to start with:

Now we can proceed further, at this point we can explode the rate limiter component and go to full fledged design. In order to do that we must keep track of the functional requirements

Briefly let us go over the job/role of each component
- Client Id system is for helping us identify client. As our rate limiter is a plug and play component, it must have a mechanism to id the clients and then use the id to pull the rules
- Rules DB is to store rules for all clients onboarded onto our service.
- Rules Service is responsible for loading Rules from DB into Rules Cache
- Rules Cache is for fast access of rules for client using their id (coming from id system)
I am not diving into the rate limiting algorithm because it depends on what direction your interviewer wants you to take. But Do read the following important Rate Limiting algos:
- Token Bucket (quite common and very easy to implement)
- Leaking Bucket
- Fixed window
- Sliding Window
- Sliding window logged
Validation (that our design works)
Scenario 1: Request comes => client Id is say 100 => Rate Limiter pulls rules for id 100 from Rules Cache => Rate Limiter validates if enough tokens are there for request to pass => tokens are there => request allowed and sent to Application server
Scenario 2: Request comes => client Id is say 100 => Rate Limiter pulls rules for id 100 from Rules Cache => Rate Limiter validates if enough tokens are there for request to pass => not enough tokens => request rejected and HTTP code 429 (too many requests) returned to client.
Questions for you.
- How will you make it distributed?
- what challenges you forsee if distributed Rate limiter is implemented.
- Can you identify points of failures in this design.
- Handling failure when Rate Limiter service goes down
- Rate Limting done at other level of the OSI model