Concept Overview


Throttling is a process that is used to control the usage of APIs by consumers during a given period. You can define throttling at the application level and API level. Throttling limit is considered as cumulative at API level. 

Administrators and publishers of API manager can use throttling to limit the number of API requests per day/week/month. For example, you can limit the number of total API requests as 10000/day.

When a throttle limit is crossed, the server sends 429 message as HTTP status to the user with message content as "too many requests".

Throttling in API manager can be of two types: Hard and soft.

Hard: The number of API requests cannot exceed the throttle limit.

Soft: In this type, you can set the API request limit to exceed a certain percentage. For example, if you set the exceed limit to 90%, user gets a notification when the exceed limit is crossed.


Rate-limiting is a process that is used to define the rate at which consumers can access APIs. Also, it determines the speed at which a consumer can access APIs. Rate limit is calculated in real time. 

Rate-limiting is applicable to resources. For example, in the following endpoint URL, "http://endpointurl/products", "/products" is a resource. It can be configured or over-ridden resource level.

You can add rate limits to API resources for the SLA plans to which the APIs are subscribed. Once the API exceeds the rate limit, the subscriber gets Status 429 message in the response header. Status 429 message indicates that the rate-limit has crossed. 

For more information on adding rate limits to API resources, refer Adding rate limits to API resources section in Publisher document. Publisher assigns multiple plans for APIs while publishing them. Subscriber can select one of the plans while consuming APIs. 

Administrators and publishers of API manager can use rate limiting to define the number of API requests per second/minute/hour.  For example, if you set the rate as 5 req/sec, the speed at which a consumer can access each API can be high.

If API Manager is deployed in a cluster, the rate-limit is considered across the nodes. The consumer gets an error message whenever the defined limit is crossed within a single node or across a combination of nodes. 

In API manager, two types of algorithms can be used to limit the rate: Rolling and Fixed.

In fixed window algorithm, the period is considered from the starting of the time unit to the end of the time unit. For example, a period is considered as 0-60 seconds for a minute irrespective of the time frame at which the API request has been made.

In rolling window algorithm, the period is considered from the fraction of the time at which the request has been made to the end of the time unit. For example, if two requests for API calls are made at 30th second and 40th second of a minute it is considered as two requests from 30th second of that minute up to the 30th second of next minute. 

Consider an illustration as follows: 


Let us assume there is a rate limit as 2 req/sec. In the above snapshot, you can notice two requests/occurrences before 1.0 second and two more occurrences after 1.0 second.

If you apply fixed window algorithm to this illustration, then two requests/occurrences are considered within 1 second. If you apply rolling window algorithm, then four occurrences are considered within 1 second. 

SLA configuration

SLA configuration in API manager enforces access control to the users through rate-limiting and throttling. For instructions to configure SLA as an Administrator, refer to the Administrator document

As a publisher, you can create a subscription plan and assign throttling limits to an API. For more information on specific instructions, refer to Add tiers to an API section of Publisher document.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License  Twitter™ and Facebook posts are not covered under the terms of Creative Commons.

Legal Notices   |   Online Privacy Policy