Performance Testing Made Simple

index

What is it?

Performance testing is “determining how a system performs in terms of responsiveness and stability under a [reasonable] workload”.

It’s a useful step to sense-check that a new service or endpoint you’re building can handle the traffic it’s expected to receive before rolling it out to live users.

Even if you’re planning to roll it out to users incrementally (with a feature flag), it’s a valuable exercise for understanding the limits of your system.

Here are some pointers to look out for.

Set up

Typically, we want to understand how the application will behave in production without actually adding any artificial load to that environment (unless you’re doing chaos engineering).

This usually involves:

Configuring the application in a pre-production or sandbox environment to run with resourcing matching its production configuration
Preparing a suite of requests to execute against the application to simulate realistic workloads
Preparing to monitor key metrics to check the application is behaving as expected

Establishing metrics to capture

The following high-level API metrics are useful as a starting point:

Throughput (requests per second or RPS)
Latency (response time in percentiles e.g. P90, P99, P99.9)
Error rate

When results are worse than expected, it is useful to drill into the possible bottlenecks such as:

CPU usage
Concurrency (e.g. thread or lock contention)
I/O performance (e.g. database queries, event throughput/lag, cache performance)

Consider independent test variables

Sometimes, you will want to thoroughly investigate how the application performs under different sets of conditions.

Here are some examples of independent variables:

Number of application instances running. At what point does the database become the bottleneck as the application scales up?
Concurrency. If there is any contention how is performance affected by the number of resources being operated on concurrently?

Evaluating the results

Ask yourself the following questions:

Do the results help us assess the risks we seeked to address?
Do the results make sense?
What are the implications of the results?