From a high-level perspective, how can I implement an API usage quota system?
In particular, it must fulfill the following requirements:
- real-time
- fast, not to slow down the API significantly
- if using in-memory caches, needs to recover after a sudden shutdown (small loss of quota precision in favor of the API client is OK)
- rate limiting (DOS protection)
- scaling well
Are there any generally accepted architectural patterns / algorithms for implementing such systems?
Do you have a database available to your API? If so, simply store a counter in there for each registered account that you want to measure or throttle.
When someone logs on, use a technique like AOP to ensure that each API call will run through your throttling algorithm, which should be simple. Pseudo-code for a 24-hour throttling system:
The above assumes that you have a batch job that walks the DB nightly and clears all the access counters back to 0 for the next day’s traffic.
The scalability of this will depend on your DB choice. Any DB could handle this, especially one of the newer NoSQL/NewSQL ones.