Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. This can allow the system to continue to function and meet service level agreements, even when an increase in demand places an extreme load on resources.
控制应用实例,个人租户或整个服务的实例使用的资源消耗。 这可以使系统能够继续运行并达到服务级别的协议,即使是在需求增加对资源造成极大的负担时。
Context and problem
The load on a cloud application typically varies over time based on the number of active users or the types of activities they are performing. For example, more users are likely to be active during business hours, or the system might be required to perform computationally expensive analytics at the end of each month. There might also be sudden and unanticipated bursts in activity. If the processing requirements of the system exceed the capacity of the resources that are available, it'll suffer from poor performance and can even fail. If the system has to meet an agreed level of service, such failure could be unacceptable.
云应用程序的负载通常随时间而变化,这取决于活跃用户的数量或其正在执行的动作类型。例如,更多用户可能在上班时间内处于活动状态,或者可能需要在每个月底执行昂贵的分析计算。活动中也可能出现突发和意料之外的爆发。如果系统的处理要求超过可用资源的容量,那么它的性能会降低甚至会失败。如果系统必须达到商定的服务水平,这种失败可能是不可接受的。
There're many strategies available for handling varying load in the cloud, depending on the business goals for the application. One strategy is to use autoscaling to match the provisioned resources to the user needs at any given time. This has the potential to consistently meet user demand, while optimizing running costs. However, while autoscaling can trigger the provisioning of additional resources, this provisioning isn't immediate. If demand grows quickly, there can be a window of time where there's a resource deficit.
根据应用程序的业务目标,有很多策略可用于处理云中的不同负载。一种策略是在任何给定的时间使用弹性伸缩将供应用户所需的资源。这有可能始终满足用户需求,同时优化运行成本。然而,虽然弹性伸缩可以触发附加资源的配置,但这种配置不是即时的。如果需求快速增长,可能会出现资源短缺的时间窗口。
Solution
An alternative strategy to autoscaling is to allow applications to use resources only up to a limit, and then throttle them when this limit is reached. The system should monitor how it's using resources so that, when usage exceeds the threshold, it can throttle requests from one or more users. This will enable the system to continue functioning and meet any service level agreements (SLAs) that are in place. For more information on monitoring resource usage, see the Instrumentation and Telemetry Guidance.
弹性伸缩的一种替代策略是允许应用程序使用有限的资源,然后在达到此限制时进行限流。 系统应该监控资源的使用情况,以便当使用率超过阈值时,可以抑制来自一个或多个用户的请求。 这将使系统能够继续运行并满足已经制定的任何服务级别协议(SLA)。 有关监控资源使用情况的更多信息,请参阅“仪器与遥测指导”。
The system could implement several throttling strategies, including:
- Rejecting requests from an individual user who's already accessed system APIs more than n times per second over a given period of time. This requires the system to meter the use of resources for each tenant or user running an application. For more information, see the Service Metering Guidance.
- Disabling or degrading the functionality of selected nonessential services so that essential services can run unimpeded with sufficient resources. For example, if the application is streaming video output, it could switch to a lower resolution.
- Using load leveling to smooth the volume of activity (this approach is covered in more detail by the Queue-based Load Leveling pattern
). In a multi-tenant environment, this approach will reduce the performance for every tenant. If the system must support a mix of tenants with different SLAs, the work for high-value tenants might be performed immediately. Requests for other tenants can be held back, and handled when the backlog has eased. The Priority Queue pattern
could be used to help implement this approach. - Deferring operations being performed on behalf of lower priority applications or tenants. These operations can be suspended or limited, with an exception generated to inform the tenant that the system is busy and that the operation should be retried later.
该系统可以实施几个节流策略,其中包括:
- 拒绝来自已经在给定时间段内每秒超过n次访问系统API的个人用户。这要求系统计算每个租户或运行应用程序的用户的资源使用情况。有关详细信息,请参阅“服务计量指导”。
- 禁用或降低所选非必需服务的功能,使基本服务可以不受阻碍地运行足够的资源。例如,如果应用程序是流视频输出,则可以切换到较低的分辨率。
- 使用负载均衡来平滑活跃数量(此方法通过基于队列的负载均衡模式更详细地介绍)。在多租户环境中,这种方法将降低每个租户的性能。如果系统必须支持具有不同SLA的混合租户,高价值租户的工作可能会立即执行。对其他租户的要求可以被阻止,当积压已经缓解时处理。优先级队列模式可用于帮助实现此方法。
- 延迟执行低优先级应用或租户的操作。这些操作可以被暂停或限制,通过产生异常通知租户系统忙请稍后重试。
The figure shows an area graph for resource use (a combination of memory, CPU, bandwidth, and other factors) against time for applications that are making use of three features. A feature is an area of functionality, such as a component that performs a specific set of tasks, a piece of code that performs a complex calculation, or an element that provides a service such as an in-memory cache. These features are labeled A, B, and C.
该图显示了利用三个功能的应用程序的资源使用区域图(内存,CPU,带宽和其他因素的组合)与时间的关系。 特征是功能区域,例如执行特定任务集的组件,执行复杂计算的代码片段或提供诸如内存中缓存的服务的元素。 这些特征标记为A,B和C.
The area immediately below the line for a feature indicates the resources that are used by applications when they invoke this feature. For example, the area below the line for Feature A shows the resources used by applications that are making use of Feature A, and the area between the lines for Feature A and Feature B indicates the resources used by applications invoking Feature B. Aggregating the areas for each feature shows the total resource use of the system.
特征线下方的区域表示应用程序在调用此功能时使用的资源。 例如,Feature A线下方的区域显示了正在使用Feature A的应用程序使用的资源,Feature A和Feature B的行之间的区域表示应用程序调用Feature B所使用的资源。汇总区域显示每个功能系统的总资源使用情况。
The previous figure illustrates the effects of deferring operations. Just prior to time T1, the total resources allocated to all applications using these features reach a threshold (the limit of resource use). At this point, the applications are in danger of exhausting the resources available. In this system, Feature B is less critical than Feature A or Feature C, so it's temporarily disabled and the resources that it was using are released. Between times T1 and T2, the applications using Feature A and Feature C continue running as normal. Eventually, the resource use of these two features diminishes to the point when, at time T2, there is sufficient capacity to enable Feature B again.
上图说明了延期操作的效果。 就在T1之前,分配给使用这些功能的所有应用程序的总资源达到阈值(资源使用限制)。 在这一点上,应用程序有可能耗尽可用的资源。 在该系统中,功能B相比Feature A或Feature C而言不太重要,因此暂时禁用了该功能,并释放了它所使用的资源。 在T1和T2之间,使用Feature A和Feature C的应用程序正常运行。 最终,这两个Feature使用的资源在时间点T2减少到有足够的容量以再次启用Feature B。
The autoscaling and throttling approaches can also be combined to help keep the applications responsive and within SLAs. If the demand is expected to remain high, throttling provides a temporary solution while the system scales out. At this point, the full functionality of the system can be restored.
弹性伸缩和限流方法也可以组合起来,以帮助应用程序保持响应并且符合SLA。 如果需求预期保持高位,节流将在系统扩展时提供临时解决方案。 此时,可以恢复系统的全部功能。
The next figure shows an area graph of the overall resource use by all applications running in a system against time, and illustrates how throttling can be combined with autoscaling.
下图显示了系统中运行的所有应用程序对时间的整体资源使用情况的区域图,并说明如何将限流与弹性伸缩
相结合。
At time T1, the threshold specifying the soft limit of resource use is reached. At this point, the system can start to scale out. However, if the new resources don't become available quickly enough, then the existing resources might be exhausted and the system could fail. To prevent this from occurring, the system is temporarily throttled, as described earlier. When autoscaling has completed and the additional resources are available, throttling can be relaxed.
在时间T1,达到指定资源使用的软限制的阈值。 在这一点上,系统可以开始扩展。 但是,如果新的资源没有足够快的可用性,那么现有资源可能会耗尽,并且系统可能会失败。 为了防止发生这种情况,系统会暂时被限制,如前所述。 当自动缩放完成并且额外的资源可用时,可以放宽节流。
Issues and considerations
You should consider the following points when deciding how to implement this pattern:
- Throttling an application, and the strategy to use, is an architectural decision that impacts the entire design of a system. Throttling should be considered early in the application design process because it isn't easy to add once a system has been implemented.
- Throttling must be performed quickly. The system must be capable of detecting an increase in activity and react accordingly. The system must also be able to revert to its original state quickly after the load has eased. This requires that the appropriate performance data is continually captured and monitored.
- If a service needs to temporarily deny a user request, it should return a specific error code so the client application understands that the reason for the refusal to perform an operation is due to throttling. The client application can wait for a period before retrying the request.
- Throttling can be used as a temporary measure while a system autoscales. In some cases it's better to simply throttle, rather than to scale, if a burst in activity is sudden and isn't expected to be long lived because scaling can add considerably to running costs.
- If throttling is being used as a temporary measure while a system autoscales, and if resource demands grow very quickly, the system might not be able to continue functioning—even when operating in a throttled mode. If this isn't acceptable, consider maintaining larger capacity reserves and configuring more aggressive autoscaling.
在决定如何实现这种模式时,您应该考虑以下几点:
- 调整应用程序和使用策略是影响系统整个设计的体系结构决策。在应用程序设计过程中应该考虑调节节流,因为系统实施后不容易添加。
- 调速必须快速执行。该系统必须能够检测活动的增加并相应地做出反应。在负载缓解之后,系统还必须能够快速恢复到原来的状态。这要求不断捕获和监视适当的性能数据。
- 如果服务需要临时拒绝用户请求,则应返回特定的错误代码,以便客户端应用程序了解拒绝执行操作的原因是由于限制,客户端应用程序在重试请求前可以等待一段时间。
- 在系统弹性伸缩时,限流可用作临时措施。在某些情况下,如果突发事件突然发生,并且预计不会长时间生活,那么限流比弹性伸缩要好,因为扩展会大大增加运行成本。
- 如果在系统弹性伸缩时将限流作为临时措施使用,并且如果资源需求增长非常快,即使在限流模式下系统也可能无法继续运行。如果这是不可接受的,请考虑维持更大的容量储备并配置更积极的自动缩放。
When to use this pattern
Use this pattern:
- To ensure that a system continues to meet service level agreements.
- To prevent a single tenant from monopolizing the resources provided by an application.
- To handle bursts in activity.
- To help cost-optimize a system by limiting the maximum resource levels needed to keep it functioning.
使用此模式:
- 确保系统继续符合服务水平协议。
- 防止单一租户垄断应用程序提供的资源。
- 处理突发事件。
- 通过限制正常运行所需的最大资源来优化系统成本。
Example
The final figure illustrates how throttling can be implemented in a multi-tenant system. Users from each of the tenant organizations access a cloud-hosted application where they fill out and submit surveys. The application contains instrumentation that monitors the rate at which these users are submitting requests to the application.
下面的图片说明如何在多租户系统中实现节流。 每个租户的用户访问云托管的应用程序,他们填写并提交调查。 该应用程序包含监视这些用户向应用程序提交请求速率的工具。
In order to prevent the users from one tenant affecting the responsiveness and availability of the application for all other users, a limit is applied to the number of requests per second the users from any one tenant can submit. The application blocks requests that exceed this limit.
为了防止单个租户的用户影响所有其他用户访问应用程序的响应性和可用性,对每个租户的用户每秒可以提交的请求数量进行限制。 应用程序阻止超出此限制的请求。
Related patterns and guidance
The following patterns and guidance may also be relevant when implementing this pattern:
- Instrumentation and Telemetry Guidance. Throttling depends on gathering information about how heavily a service is being used. Describes how to generate and capture custom monitoring information.
- Service Metering Guidance. Describes how to meter the use of services in order to gain an understanding of how they are used. This information can be useful in determining how to throttle a service.
- Autoscaling Guidance. Throttling can be used as an interim measure while a system autoscales, or to remove the need for a system to autoscale. Contains information on autoscaling strategies.
- Queue-based Load Leveling pattern. Queue-based load leveling is a commonly used mechanism for implementing throttling. A queue can act as a buffer that helps to even out the rate at which requests sent by an application are delivered to a service.
- Priority Queue pattern. A system can use priority queuing as part of its throttling strategy to maintain performance for critical or higher value applications, while reducing the performance of less important applications.
在实现此模式时,以下模式和指导也可能是相关的:
- Instrumentation and Telemetry Guidance。限流取决于收集有关服务使用量。介绍如何生成和捕获自定义监视信息。
- 服务计量指导。描述如何计量服务的使用,以了解如何使用它们。此信息可用于确定如何限制服务。
- 自动缩放指导。限流可以作为临时措施用于系统自动调整,或者不需要系统自动调整。包含有关自动缩放策略的信息。
- 基于队列的负载均衡模式。基于队列的负载均衡是实现限流的常用机制。队列可以充当缓冲区,有助于将应用程序发送的请求的速率均匀地传递到服务。
- 优先队列模式。系统可以使用优先级排队作为其节流策略的一部分,以维护关键或更高价值应用程序的性能,同时降低不太重要的应用程序的性能。