Evaluat is in private access. Demos open through July. Book a slot

Blog Guides & best practices

Soak testing explained: catching slow degradation and memory leaks over time

Some failures never show up in a ten-minute test. A memory leak, a connection that never closes, a cache that only grows: these surface after hours of steady traffic, not minutes. Soak testing holds a realistic load for hours or days to expose the slow degradation short tests miss, before your users meet it as a 3 a.m. outage.

Written by: Evaluat Staff ·

A soak test watches memory over hours of steady load. Healthy memory rises and falls back to a stable baseline; a leak climbs in a rising sawtooth that never comes back down and approaches an out-of-memory ceiling.

What is soak testing?

Soak testing, also called endurance testing, runs a steady, realistic load against a system for hours or days to see how it holds up over time. It looks for problems that build up slowly under continuous use: memory leaks, resource exhaustion, and gradual performance decline that a short test ends too soon to catch.

Most performance tests are sprints. A load test ramps traffic to your expected peak, holds it for a few minutes, confirms the system copes, and stops. A soak test is a marathon: you pick a load the system will realistically see, often its average day, and hold it there for hours or days while you watch memory, connections, and response time as the clock runs.

The name is the analogy. You leave the system soaking in load the way you might idle an engine for hours to see whether it overheats, rather than flooring it once to see how fast it goes. The failures a soak test is built to find need time to accumulate; nothing dramatic happens in the first ten minutes.

Like other performance tests, a soak test is driven by virtual users, each one a simulated visitor following a scripted journey. The difference is duration. Grafana’s k6 documentation describes a soak test as holding average load for “several hours or even days” while you watch performance and resource use for signs of decline. That long hold is the method. Drop it to fifteen minutes and you are running a load test again.

Soak testing vs load and stress testing

Soak testing, load testing, and stress testing answer different questions. Load testing checks that you handle expected peak briefly. Stress testing pushes past that peak to find the breaking point. Soak testing holds a realistic load for a long time to find what degrades over hours. The first two are about intensity; soak testing is about duration.

The distinction matters, because the three catch different bugs. A system can pass a load test and a stress test and still fail a soak test, because the thing that breaks it only appears after the eighth hour.

Load testingStress testingSoak testing
Question it answersCan we handle expected peak?Where is the breaking point?Does it stay healthy over time?
Load levelExpected peakBeyond peak, until it breaksAverage, realistic everyday load
DurationMinutes to about 2 hoursUntil it breaks, then recoveryHours to days (often 8 to 72)
Typical findingSlow queries, capacity gapsFailure mode, recovery timeMemory leaks, resource exhaustion, drift

For the full picture of how these fit together, see load vs stress vs performance testing. The short version: run a load test to prove you handle the traffic, a stress test to find the ceiling, and a soak test to prove the system survives a long shift at the traffic you actually get. Most teams run all three, because each answers a question the others cannot.

Why does soak testing matter?

Soak testing matters because the failures it catches reach production quietly, then take the whole system down at the worst time. A slow memory leak does not trip a short test. It waits until hour twenty, or the third day of a long weekend, then the process runs out of memory while the on-call engineer is asleep.

The pattern is familiar to anyone who has run a service for a while: the app works fine, but only if it is restarted every night. That nightly restart is not a fix. It is a workaround for a leak nobody has found yet, and it hides the problem until the uptime is long enough that a restart no longer comes in time.

The cost of getting this wrong is concrete. In ITIC’s 2024 Hourly Cost of Downtime survey, which polled over 1,000 organisations between November 2023 and March 2024, more than 90% of mid-size and large enterprises said a single hour of downtime now costs them over $300,000, and 41% put the figure between $1 million and $5 million or more. A leak you find in a soak test is a bug ticket. The same leak found in production is an outage on that scale, on a schedule you do not control.

Soak testing matters most for systems expected to run continuously without hands-on support: a payment service over a holiday weekend, an API behind a mobile app around the clock, a checkout in a multi-day sale. Those are the ones that have to stay up for days without a safety net.

What does a soak test find?

A soak test finds the problems that grow slowly under continuous load and never appear in a short run. The headline ones are memory leaks and the resource leaks around them: database connections that are never returned, threads that pile up, file handles left open, logs and temporary data filling a disk, and caches that grow without ever evicting anything.

It helps to separate two things that look identical on a graph. A true memory leak is memory the program allocates and then loses track of, so it can never be reclaimed; it accumulates until there is none left. Unbounded growth is different: resources that grow by design but were never given a limit, like a cache with no eviction policy or a list that only gets appended to. The cause differs, but the soak-test signature is the same: a resource that climbs and never comes back down, which is why one long test catches both.

A real example shows the shape. In November 2025, Freshworks engineers published a postmortem of a leak in a Java service: a refactor had started creating a fresh HTTP client and thread pool on every request instead of reusing one, and none were ever closed. The result was roughly 5,754 abandoned client objects holding about 63% of the heap, and pods killed for running out of memory even under light load. The lesson is in their own finding: the leak evaded short functional tests because it needed hours of continuous traffic to grow large enough to matter. That is the gap a soak test exists to close.

The front end is a long-running process too, and it degrades the same way. A single-page application a support agent keeps open all day can accumulate memory in the browser tab until the page slows to a crawl or the browser kills it. Kustomer’s engineering team documented exactly this in 2023: their web app got gradually slower as the work day went on, with per-session browser memory climbing from a few hundred megabytes in the morning to over a gigabyte by evening, until some users hit Chrome’s “page unresponsive” dialog. Refactoring the offending code dropped daily crashes from the hundreds to under fifty. None of that lives on the server, a point we will come back to.

How does a soak test work?

A soak test works by holding a steady, realistic load for a long time and watching resource trends, not just pass-or-fail results. You ramp virtual users up to an average production level, hold them there for hours or days, then ramp down. Throughout the hold, you record memory, connections, threads, and response time, watching for any line that climbs and never returns to baseline.

The method is deliberately boring: no spike to catch, no breaking point to find. The steady state is the test, and the data that matters is the slope of each resource over time.

The single most useful thing to watch is the shape of memory over the run. Healthy memory is a sawtooth: it rises as the application does work, then drops back as the garbage collector reclaims what is no longer needed, settling around a stable baseline. A leak breaks that pattern. The low points of the sawtooth creep upward, the baseline drifts higher hour after hour, and the line never fully recovers. That upward drift, the line that never comes back down, is the signature every soak test is hunting for.

Two memory-over-time charts. Healthy memory rises and falls in a sawtooth that returns to a steady baseline after garbage collection. A leaking process climbs in a staircase that never returns to baseline and reaches an out-of-memory ceiling.

A worked example makes it concrete. Run 200 virtual users through a checkout journey for eight hours at the average rate your site sees on a normal day. For the first hour it looks like a clean load test: response times flat, memory sawtoothing around 2 GB. By hour four the baseline has crept to 4 GB and the 95th percentile response time, the experience of your slowest one in twenty users, is higher than it started. By hour seven memory is at 6 GB and climbing, the garbage collector is working harder, and response times are visibly worse. Nothing has crashed, and a fifteen-minute test would have passed, yet the slope tells you there is a leak that would have taken the service down on day two. That slope only appears under sustained load, where a server-side metric and a user-facing one can drift apart over a long run.

How long should a soak test run?

How long a soak test should run depends on how long your system has to stay up in production, but the practical range is hours to days. Four hours is a reasonable first test, eight to twenty-four hours is the common band, and systems that must run around the clock are often soaked for seventy-two hours to cover a full weekend without a restart.

The principle is simple: the test should run at least as long as the longest stretch your system runs untouched in production. If you deploy daily and restart in the process, a leak that takes thirty-six hours to bite may never matter. If your service runs for weeks between deploys, a four-hour test proves very little.

Start short and extend as you build confidence. A four-hour soak surfaces fast-growing leaks cheaply; once that is clean, stretch to an overnight run, then to the duration that matches your real uptime requirement. BrowserStack’s endurance-testing guide recommends seventy-two hours for systems that must run around the clock, because that span covers an unattended weekend, when thin staffing and long uptime combine to turn a slow leak into an outage.

System typeSuggested first soakLoad level
Deploys daily, restarts often4 to 8 hoursAverage daily traffic
Always-on API or SaaS24 hoursAverage daily traffic
Must run unattended over a weekend72 hoursAverage, including the quiet hours

Two things stay fixed regardless of duration. The load level is average production traffic, not peak; soak testing is about time, not intensity. And the monitoring has to run for the whole test, sampling memory, connections, and response time continuously, because a soak test you do not watch is just an expensive way to keep a server busy.

Common soak testing mistakes

The mistakes that ruin a soak test mostly come down to running it too short, at the wrong load, or without watching the right things. A soak test earns its keep only if it runs long enough for slow problems to surface, holds a realistic load, and records resource trends across the whole run, not a single pass-or-fail result at the end.

  • Running it too short. A two-hour soak will miss a leak that takes six hours to show. Match the duration to your real uptime requirement, and when in doubt, run it longer.
  • Testing at peak instead of average. Soak testing is about duration, not intensity. Holding peak load for days tests the wrong thing and costs far more; use the average traffic the system actually sees. Pushing to the limit is stress testing, a different test for a different question.
  • Only checking pass or fail. The result of a soak test is the slope of each resource over time, not a green tick at the end. If you are not sampling memory, connections, and response time throughout, you cannot see the drift that is the whole point.
  • Resetting state between iterations. If every virtual user starts a brand-new process or clears all state each loop, leaks never get the chance to accumulate. The run has to let resources build the way they would in a long-lived production process.
  • Ignoring the front end. Most soak testing watches the server and stops there. A long-lived browser session degrades too, and that decline is invisible to any test that never opens a browser. More on that next.
  • Soaking production without a plan. A days-long test against live infrastructure can affect real users and real data. Use a production-like staging environment, or a controlled window with monitoring and a way to stop instantly.

How Evaluat fits into a soak test

Evaluat is the real-browser layer of a soak test: it runs each virtual user in its own isolated browser and holds that load over time, so the test sees whether the experience stays fast as the system runs hot, not just whether the server survived. It does not replace the server-side half, and the honest boundary matters, so this section draws it plainly.

A soak test built entirely on HTTP requests can watch a server’s memory and connection pools closely and still tell you nothing about what the page felt like at hour eight, because no browser ever ran. The front end is a long-running process with its own slow degradation, and the only way to measure it is to render the page under load, the whole time.

Evaluat is built on the real-browser model. Every virtual user is a real browser, each with its own memory, CPU, cache, and network stack, and the load shape (a ramp up, a steady hold, and a ramp down, at a target concurrency) is how you describe a load test, a stress test, or a soak test in one tool. Across the hold, it captures Core Web Vitals for every session, including Interaction to Next Paint, the metric that measures responsiveness and the first thing to suffer when a page is slowly choking on its own memory. Each session also keeps a video, a network log, and a console log.

That per-session record is what makes drift legible. If the experience degrades over a long run, the later sessions show it: their Vitals are worse than the first hour’s, and you can open the worst one, watch the video of the page going sluggish, and read the console for the warning that came with it. A failure at peak isn’t a percentile. It’s a session. Comparing early-run sessions against late ones turns “the site felt slow by the afternoon” into specific, addressable evidence.

The boundary is the honest part. Evaluat is not an application performance monitor. For the server side of a soak test, the heap graphs, connection-pool counters, and thread dumps, your APM and a protocol-level load tool like k6 are the right instruments, and cheaper per virtual user over a long run. The real-browser layer adds what an HTTP request never sees: whether the page your customer loads stays responsive after the system has run hot for hours. Run both, and one soak test covers the server and the experience at once.

Run the test that watches the clock

Soak testing is the test that asks the question the others skip: not can the system handle the traffic, but can it keep handling it. Set a realistic load, hold it for hours or days, watch every resource for the line that climbs and never comes back down, and match the duration to how long your system has to run untouched. The leak you find on hour seven of a test is a ticket. The one your users find on hour seven of a long weekend is an outage.

Evaluat runs every virtual user in a real browser and keeps the Core Web Vitals, session video, network logs, and console logs for each one, so a sustained run shows you whether the experience held up over time, not just whether the server did.

Test in real browsers. Debug in real sessions. Book a demo.

Common questions

FAQ

What is soak testing?

Soak testing, also called endurance testing, runs a steady and realistic load against a system for an extended period, often eight to seventy-two hours, to see how it behaves over time. Short load tests confirm a system handles peak traffic for a few minutes. A soak test holds that traffic long enough to expose memory leaks, resource exhaustion, and slow performance degradation that only appear after hours of continuous use.

Is soak testing the same as endurance testing?

Yes. Soak testing and endurance testing are two names for the same thing: holding a sustained, realistic load for a long period to find issues that accumulate over time. Some teams also call it longevity or stability testing. The goal is identical in every case, which is to catch slow degradation a short test would never reach.

What is the difference between soak testing and load testing?

Load testing checks that a system handles its expected peak traffic, usually for a few minutes to a couple of hours. Soak testing holds a steady, realistic load for far longer, hours or days, to find problems that only surface over time, such as memory leaks and resource exhaustion. Load testing asks whether you can handle the traffic; soak testing asks whether you can keep handling it.

How long should a soak test run?

Long enough to surface degradation a short test would miss, tied to how long your system must run in production. Four hours is a reasonable starting point, eight to twenty-four hours is common, and systems that must run unattended around the clock are often soaked for seventy-two hours so they cover a full weekend. Run at average production load, not peak.

What does a soak test find?

Soak testing finds problems that accumulate slowly under sustained use. The classic ones are memory leaks, database connection or thread leaks, file-handle exhaustion, logs or temporary files filling a disk, caches that grow without eviction, and a gradual rise in response time. These are invisible to a short test because they take hours of continuous load to build up.

How do you detect a memory leak with a soak test?

You hold a steady load and watch memory usage over the whole run instead of just checking pass or fail. Healthy memory rises and falls as the garbage collector reclaims it, returning to a stable baseline. A leak shows up as memory that climbs and never returns to baseline, trending upward hour after hour until the process slows, restarts, or runs out of memory.

What load level should a soak test use?

Average production load, not peak. Soak testing is about duration, not intensity, so the point is to sustain a realistic everyday level of traffic for a long time rather than to push the system to its limit. Pushing to the breaking point is stress testing, which answers a different question and runs for a much shorter time.

When should you run a soak test?

Run a soak test before a release that changes how the system manages memory, connections, or other long-lived resources, and as a periodic check for anything expected to run continuously. Many teams soak after their load and stress tests have passed, so the test runs against a build already known to handle peak traffic. It matters most for systems where downtime is costly or support is thin, such as overnight or over a weekend.

See it on your site

Test in real browsers.
Debug in real sessions.

Want to see this measured on your app?

30 minutes. We build a scenario on your real customer journey, run a small test, and walk you through the report with your data in it.