Rate Limiting and Blocking Documentation
Introduction
Configuration Parameters
Blocking Behavior Summary
Timers and Schedules
High-Throughput Considerations
This document describes the configuration, behavior, and operational tasks of the system's rate limiting and blocking module.The system monitors service usage, enforces request thresholds, and optionally escalates repeated violations to a user-level block.
It separates detection from enforcement, allowing applications to define custom responses to limit breaches.Configuration includes limits on requests per minute (RPM), requests per day (RPD), escalation rules, notification settings, temporary data retention, and administrative overrides.
Additionally, a set of timers and scheduled tasks ensures rate-limiting data is maintained, evaluated, and purged automatically to support both operational reliability and analytics accuracy.
Rate Limit Evaluation Methods
1. RateLimitEvaluationToUsersEvaluates whether a user should be blocked based on service and application usage. This does not track individual machines.
RateLimitEvaluationToUsers
Inputs:
ServiceName: The service being used.
AppName: The client application name.
Output:
BlockRequest: Boolean flag indicating whether the user exceeds the rate limit.
2. RateLimitEvaluationToAPIEvaluates whether an incoming API request should be blocked based on the service, application, and machine ID.
RateLimitEvaluationToAPI
ServiceName: The service being called.
MachineId: Unique identifier for the client machine.
BlockRequest: Boolean flag indicating whether the request exceeds the rate limit.
Service-Level Block: Temporary, automatically restored once usage falls below thresholds. No administrative action required.
User-Level Block: Permanent (if enabled), requiring manual administrative intervention to restore access.
The system supports notifications and temporary data retention to support monitoring and analytics.
Timers and Scheduled Tasks
The system relies on several timers and scheduled tasks to maintain data integrity, evaluate usage, and perform automatic operations.
These timers ensure the system remains consistent, efficient, and responsive, maintaining accurate usage statistics and analytics for both service-level and user-level monitoring.
The system is designed to handle request volumes efficiently while maintaining accurate rate limiting and blocking behavior. It uses a combination of cached block flags and asynchronous processing, with an optional mode to bypass the cache when needed.
Request Flow
Cache Read (Configurable) – Each incoming request normally checks a cached table containing the current service- and user-level block flags.
If configured for uncached reads, the system will query the primary store directly, ensuring the most up-to-date block state.
Record Request – The request is recorded (e.g., in a queue or temporary store) for asynchronous processing.
Asynchronous Processing – New entries are asynchronously processed to update the blockages table, evaluate escalation rules, and trigger notifications when necessary.
Note: Because blockages are processed asynchronously, it may take a few seconds for a block to be raised after the limit has been triggered.
Benefits
Low Latency (Cached Mode): Per-request evaluation reads from the cache, avoiding database hits and reducing response times.
Consistency (Uncached Mode): Direct reads from the primary store ensure the latest block state, at the cost of higher latency.
Asynchronous Scalability: Updates and analytics processing occur asynchronously, reducing bottlenecks on the critical request path.
Comprehensive Analytics and Escalation: Request data is persisted and processed to support historical tracking and user-level block escalations.
Flexible Enforcement: Supports service-level and user-level blocking with configurable burst tolerance and escalation thresholds.
Considerations and Limitations
Cache Consistency: Cached block flags may be slightly outdated, potentially allowing some requests over the limit.
Asynchronous Lag: High traffic bursts may delay updates to user-level blocks or notifications.
Performance Trade-Off: Uncached reads ensure accurate enforcement but can introduce higher latency, which may impact throughput under heavy demand.
High-Demand Limitations: While the system scales well for moderate to high volumes, extreme or sustained peaks may require careful tuning of cache size, eviction policies, and asynchronous processing frequency.
Propagation Delay: Blocks may not be applied instantaneously due to asynchronous processing; a short delay of several seconds is possible before the block takes effect.
This design provides a balance between performance and consistency, allowing operators to configure cached or uncached reads depending on operational priorities. It is well-suited for enterprise or internal APIs but may face limitations under extremely high-demand loads.