Kubernetes Resource Requests and Limits

Resource Requests and Limits are an essential feature of Kubernetes Pods that help determine where they can and should be deployed. In Kubernetes, two types of resources are supported: CPU and memory.

Requests

The request is a reserved amount of a resource for a Pod. For instance, if you request one CPU core, your Pod will be allocated to a Kubernetes Node that has at least one CPU core available.

Limits

Limits represent the maximum amount of a resource a Pod is allowed to use. For example, if you set a limit of 1GB memory for your Pod and it tries to use more than that, Kubernetes will inform the Pod that it’s out of memory.

Importance of Resource Requests and Limits

Resource Requests and Limits play a crucial role in ensuring the proper functioning of Kubernetes clusters. The following are some reasons why they are significant:

Ensure that pods have adequate resources to perform their tasks.
Prevent other pods from affecting your pod by using up resources that your pod requires.
Ensure that the Kubernetes cluster has enough Nodes to support your services.
Ensure that the Kubernetes cluster doesn’t have too many nodes, which could lead to a waste of budget.

Resource Requests and Limits are crucial features in Kubernetes Pods that help allocate resources and ensure that your pods have enough resources to perform their intended tasks. The two resources that Kubernetes supports are CPU and memory.

Measuring CPU and Memory Usage

Before deploying your service to production, it’s important to measure the CPU and memory usage of your application. During a load test is a good time to measure CPU and memory usage. You should avoid setting very low limits or requests before the first deployment to production. It’s better to lower the limits and requests after you know how much you really need.

To see the current resource usage of your service in a Docker container, you can use the command docker stats. If you’re running your service in Kubernetes, you can use kubectl top pod -n ${NAMESPACE} to see the current resource usage.

In addition to CPU usage, it’s important to monitor CPU throttling (docker.cpu.throttled on Datadog). Under normal production load, this metric should be almost always 0. If it’s frequently non-0, it means your CPU limits are too low.

Recommendations for Initial Limits and Requests

The following are recommended initial limits and requests for a stateless microservice. You can use these until you have performed load tests to understand the actual resources needed by your service. Note that if your service keeps a significant amount of data cached in memory or has to handle a high number of concurrent requests, these numbers may not be appropriate.

requests:
    cpu: 500m      # 1/3x as limits.cpu; should be low enough to keep CPU usage equal to
                   # the HPA target multiplied by requests.cpu
    memory: 256Mi  # should be the same as limits.memory, and high enough to avoid
                   # OOM during normal operations and in case of dependency failures
  limits:
    cpu: 1500m     # 3x as requests.cpu; should be high enough to keep `docker.cpu.throttled`
                   # down to 0 during normal operations
    memory: 256Mi  # should be the same as requests.memory

After performing a representative load test of production workloads and especially after releasing in production, you should review your resource requests and limits and fix them as needed. Refer to the next section for general recommendations on how to choose limits and requests.

Recommendations for Limits and Requests

Keep in mind that these recommendations may not apply to all services or workloads. Do not blindly apply them without load tests!

In general, it’s advisable to have many small pods instead of very few big pods, and you should have at least 3 pods for availability.
Avoid running too many pods, as this can cause resource exhaustion or overload in your dependencies (e.g., too many open connections on your database or caching server), make debugging/troubleshooting more difficult, and slow down the deployment process.
Allocate just enough resource headroom to ensure that your pods can work correctly under nominal conditions and likely exceptional circumstances. Avoid needlessly reserving too many resources.
Document the rationale for choosing specific values for the requests and limits.
Memory limits should be equal to memory requests. This makes it unlikely for Kubernetes to kill your pod due to the memory consumption of other pods.
The overall CPU utilization (docker.cpu.usage) of your deployment should ideally always be the one set for the HPA (e.g., if your CPU request is 500m and your HPA target is 65%, your CPU usage in each pod should always be 65% of 500m, i.e., ~325m). Considering that the recommendation is to set HPA min replicas

Golang services

For Golang services, the garbage collector (GC) can have negative interactions with the Completely Fair Scheduler (CFS) and lead to high tail latencies. To mitigate this issue, the following recommendations can be followed:

Set GOMAXPROCS to floor(limit.cpu * 2 / 3) or use a tool like github.com/uber-go/automaxprocs to set it automatically.
The CPU limit should be set to 2x~4x the CPU request.
The CPU limit should not be less than 1000m and ideally at least 1500m since GOMAXPROCS cannot be less than 1.

Note that these recommendations are specific to Golang services and should be used only if tail latency is a concern.