Evaluat is in private access. Demos open through July. Book a slot

Blog

Product updates, engineering notes, and guides on testing performance in real browsers.

A fast backend is not a fast page. On the page-load timeline, the server response is a tiny sliver of about 50 milliseconds, while the browser then spends seconds downloading, parsing, running JavaScript, and painting before the page is usable at around 8 seconds. API performance testing measures only the server sliver; browser performance testing measures the whole wait. Figures are illustrative, drawn from a 2025 Catchpoint benchmark.

API performance testing vs browser performance testing: which your QA strategy needs

Your API responds in fifty milliseconds. Your page still takes eight seconds to feel ready. API performance testing and browser performance testing measure different layers of that gap, and your QA strategy needs both. Here is what each one catches, what it misses, and how to decide which to run first.

Evaluat Staff ·

A Core Web Vitals release gate: across pull requests, Largest Contentful Paint stays under a 2.3-second budget until one build regresses to 2.4 seconds and the gate blocks it, even though that is still within Google's 2.5-second good threshold.

Performance regression testing: making Core Web Vitals a CI/CD release gate

A green test suite proves your code is correct. It says nothing about whether the page got slower. Performance regression testing closes that gap: set Core Web Vitals budgets, measure every build against a baseline, and fail the pipeline when a change busts one. This guide wires that gate into CI/CD, from baselining main to the regressions only load reveals.

Evaluat Staff ·

An Apdex score sorts every request into three buckets against a target time T: satisfied (at or under T, counted in full), tolerating (between T and 4T, counted as half), and frustrated (over 4T or errored, counted as zero). The formula, satisfied plus half the tolerating divided by the total, produces one user-satisfaction score between 0 and 1, shown here at 0.875.

What is an Apdex score? Measuring user satisfaction in performance testing

A load test can come back full of green percentiles and still not tell you whether the people behind them were satisfied or quietly giving up. An Apdex score answers that in one number from 0 to 1: you set a target response time, and it reports how many requests left users satisfied rather than merely tolerating, or frustrated.

Evaluat Staff ·

Three test types, three load shapes: load testing ramps to a steady plateau, stress testing climbs past the breaking point, and performance testing is the umbrella over both.

Load testing vs stress testing vs performance testing: how the three actually differ

Three terms, endless confusion. Performance testing is the umbrella; load testing checks whether you survive the traffic you expect; stress testing pushes past that to find where you break. This guide shows how the three actually differ, when to run each, and which one your team needs first.

Evaluat Staff ·

Core Web Vitals at load: a page holds a good 2.1 second Largest Contentful Paint for one user, but as concurrent virtual users rise, LCP climbs and crosses the 2.5 second good threshold, ending near 3.4 seconds at 500 users.

Core Web Vitals at load, explained

A page can score green in a single-user Lighthouse run and still ship a red Largest Contentful Paint the moment real traffic arrives. Core Web Vitals change under load: the server slows, time to first byte grows, and interactions wait on a busy backend. This guide explains why each Vital moves under load, and how to measure them at concurrency.

Evaluat Staff ·

Interaction to Next Paint: a visit's interactions drawn as stacked bars of input delay, processing, and presentation time. Most stay under Google's 200ms good threshold while one slow interaction passes 500ms and sets the page's INP.

Interaction to Next Paint (INP), explained for engineers

A page can pass every functional test and still feel slow on the second tap. Interaction to Next Paint is the Core Web Vital that catches it: the latency of your slowest interaction across a visit, timed from the click to the next frame painted. Here is what INP captures, what drags it past 200ms, and how to test it under load.

Evaluat Staff ·

A spike test load profile: virtual users jump from a flat baseline to a sudden peak, hold, then drop back, while server capacity rises too slowly to keep up, leaving a gap during the surge.

What is spike testing? Preparing for traffic surges and flash sales

A flash sale does not ramp up. Ten thousand people hit checkout in the same minute, and the autoscaler is still booting servers when the page falls over. Spike testing rehearses that surge on purpose, a sudden jump in traffic then a sudden drop, so you learn whether the site survives the moment before your customers find out for you.

Evaluat Staff ·

A soak test watches memory over hours of steady load. Healthy memory rises and falls back to a stable baseline; a leak climbs in a rising sawtooth that never comes back down and approaches an out-of-memory ceiling.

Soak testing explained: catching slow degradation and memory leaks over time

Some failures never show up in a ten-minute test. A memory leak, a connection that never closes, a cache that only grows: these surface after hours of steady traffic, not minutes. Soak testing holds a realistic load for hours or days to expose the slow degradation short tests miss, before your users meet it as a 3 a.m. outage.

Evaluat Staff ·

A stress test: virtual users climb steadily while response time stays flat, then spikes sharply at the breaking point where the system starts to fail.

Stress testing a website: how to find the breaking point before your users do

Every website has a breaking point. The only question is whether you find it in a test or your users find it during a sale. Stress testing pushes the site past its limit on purpose, so you learn where it fails, how it fails, and how fast it recovers, before real traffic does. Here is how to run one.

Evaluat Staff ·

Smoke testing versus performance testing: a smoke test runs a quick pass-or-fail check that critical paths are not broken, while a performance test ramps many virtual users to measure how the system holds up under load.

Smoke testing vs performance testing: when a quick pre-release check is enough

Smoke testing and performance testing get treated as rivals, but they answer opposite questions. A smoke test asks whether a new build is broken. A performance test asks whether it stays fast and stable under load. This guide shows how the two differ, and when a quick pre-release check is genuinely enough.

Evaluat Staff ·

Load testing: concurrent virtual users ramp up, hold at a steady load, then ramp down, with each virtual user running in its own browser.

What is performance testing? A QA engineer's guide to testing under real traffic

Your app works fine for one user. Then a launch sends three thousand at once and pages crawl. Performance testing is how QA teams measure speed, stability, and scale under real traffic, on purpose, in a test instead of in production. This guide covers what performance testing is, its types, and when to run it.

Evaluat Staff ·

The cost of one virtual user. A protocol virtual user in a load tool like k6 costs about 1 to 5 megabytes, so one machine runs tens of thousands of them, shown as a dense field of dots. A real-browser virtual user driven by Playwright costs hundreds of megabytes and about one CPU core, so one machine runs only dozens to low hundreds, shown as a few browser windows. Roughly 50 to 100 times more compute per user.

Playwright for performance testing: can a browser automation tool drive virtual users?

You already know Playwright for end-to-end tests. Can you reuse it for performance testing and call each browser a virtual user? You can, but a real browser is expensive to run, so it drives a handful, not a flood. Here is how far Playwright scales, and where you reach for a different tool.

Evaluat Staff ·

A release cycle drawn as five stages from planning to production. The weight of each performance test grows from light checks on every commit to a heavy real-browser load test at the pre-release gate, then tapers to monitoring in production.

Where does performance testing fit in an agile release cycle?

Agile teams ship every week, sometimes every day. Performance testing built for a quarterly release does not fit that rhythm, so it slides to the end, then to never, until production buckles. It does not have to. This guide maps each performance test to a stage: cheap checks every commit, a real-browser load test at the pre-release gate, monitoring after.

Evaluat Staff ·

Core Web Vitals lab versus field data: a lab test is one synthetic point that scores good, while real-user field data is a wide distribution whose 75th percentile crosses the good threshold and fails, because real devices, networks, and traffic vary.

Core Web Vitals: why lab scores differ from real users

Your Lighthouse score says 98. Your Core Web Vitals report says the page is failing. Both can be right. A lab test measures one synthetic load on a fixed device and network; field data is the spread of every real device, connection, and click your users bring, including the traffic a lab never simulates. Here is why lab and field diverge, and which to trust.

Evaluat Staff ·

A right-skewed histogram of response times. Most requests cluster fast around a 100-millisecond median, with a long tail of slow ones stretching right. The mean, about 420 milliseconds, sits on the sparse downslope where almost no requests land, while p95 at 2,800 milliseconds and p99 at 5,200 milliseconds sit far out in the tail. The average describes no actual user.

Why average response time misleads you: reading p95 and p99

Your dashboard says average response time is 420 milliseconds. Half your users see 100, one in a hundred waits over five seconds, and the average describes none of them. p95 and p99 read response time from the slow end, where the failures you run a performance test to find actually live.

Evaluat Staff ·

A performance test report read in three passes: did it keep up (active users, throughput, error rate), how slow was it really (response time percentiles, Time to First Byte), and what did users feel (Core Web Vitals, Apdex). The eighth metric, a per-URL and per-session breakdown, shows where it broke, flagging a checkout page at a 4.2 second Largest Contentful Paint.

8 metrics every performance test report should include

A performance test report full of green averages can still hide a checkout that buckled at peak. The numbers that catch it come in three passes: did the system keep up, how slow was it really, and what did users feel. Here are the eight metrics that answer those questions, and the benchmark that shows each is healthy.

Evaluat Staff ·

The four phases of Largest Contentful Paint shown as a timeline from navigation start to LCP: Time to First Byte about 40 percent, resource load delay under 10 percent, resource load duration about 40 percent, and element render delay under 10 percent, with a good LCP at 2.5 seconds or less.

Largest Contentful Paint (LCP), explained for engineers

Your Largest Contentful Paint is the moment the biggest thing on the page, usually the hero image, finishes rendering, and Google treats it as a Core Web Vital. This guide explains what counts as the LCP element, the four phases LCP breaks into, why your lab and field numbers disagree, and how to fix and measure it under real load.

Evaluat Staff ·

Three load-testing models compared: HTTP-script sends requests with no browser, shared-browser puts many virtual users in one browser, and real-browser gives each virtual user its own isolated browser. Only the real-browser model captures what users actually see.

Real-browser load testing, explained

Most load testing tools fire HTTP requests at your server. A few share one browser across many simulated users. Real-browser load testing gives every virtual user its own isolated browser, so it measures what your customers' browsers actually do under load. Here is how the three models differ, what each one can and cannot see, and when each is the right call.

Evaluat Staff ·

Functional testing versus performance testing: a functional test confirms that for a given input the software returns the correct output, pass or fail, while a performance test ramps many virtual users to measure whether the system holds up under load. Every release should answer both questions.

Functional testing vs performance testing: two questions every release should answer

A build can pass every functional test and still fall over the moment real traffic arrives. Functional testing answers one question: does your software do the right thing? Performance testing answers another: does it stay fast and stable under load? Every release has to answer both. This guide shows how the two differ, and where each one fits.

Evaluat Staff ·

Common questions

Performance testing FAQs

What is performance testing?

Performance testing measures how a website or app behaves under load: how fast it responds, how stable it stays, and where it breaks. It spans load, stress, spike, and soak testing, each answering a different question about behaviour under traffic.

What is the difference between load, stress, and spike testing?

Load testing checks behaviour at expected traffic. Stress testing pushes past capacity to find the breaking point. Spike testing applies a sudden surge to see how the system copes and recovers. Most teams run all three against the same journey.

What are Core Web Vitals and why do they matter under load?

Core Web Vitals are Google's user-experience metrics: Largest Contentful Paint (LCP), Interaction to Next Paint (INP), and Cumulative Layout Shift (CLS). They measure what users feel. Under load they often degrade long before the server returns errors, so measuring them in a real browser is what surfaces the problem.

Why test in a real browser instead of at the HTTP level?

HTTP-level tools stop at the server response. The seconds a user waits for the page to render, run JavaScript, and become interactive all happen afterward, in the browser. Real-browser testing captures that part, so the numbers match what users actually experience.

How often should you run performance tests?

Run them before major launches and traffic events, after significant changes to critical journeys, and on a regular cadence so regressions surface early. Continuous monitoring fills the gaps between scheduled tests.