Evaluat is in private access. Demos open through June. Book a slot
Learn · Methodology

Real-browser load testing, explained

Most load testing tools fire HTTP requests. A few share one browser across many simulated users. A real-browser load test runs each virtual user in its own browser. Here is what each model measures, where each one wins, and why the differences matter at peak.

Updated

The three load-testing models

Load testing tools fall into three architectural categories. They measure different things and they are not interchangeable.

HTTP-script load testing

The original model. The tool fires HTTP requests directly at the server, simulates user pacing in code, and measures server response time. k6, JMeter, Locust, Gatling, and Artillery are the well-known examples.

What you get: request-per-second throughput, response-time distributions, error rates, status code histograms. Cheap, fast, accurate for what it measures.

What you do not get: anything that happens in a browser. No HTML parsing, no JavaScript execution, no rendering. Web Vitals are unmeasured because the browser does not exist in the test. Third-party tags, A/B tests, consent banners, and analytics SDKs are invisible because they never run.

Shared-browser load testing

A middle ground. One browser process handles many simulated users by running multiple scenarios in parallel within the same Chromium instance. Lighter on infrastructure than the alternative.

What you get: some browser activity. JavaScript runs. The DOM renders. You can capture screenshots.

What you do not get: realistic contention. A single browser doing 100 things in a row has nothing to do with 100 independent browsers doing one thing each. The CPU profile is wrong. The memory profile is wrong. The cache behaviour is wrong. The Web Vitals numbers that come out are optimistic in ways that production rarely respects.

Treat shared-browser tools as functional check harnesses, not as load-testing instruments.

Real-browser load testing

Each virtual user runs in its own isolated browser. Its own memory. Its own CPU allocation. Its own cache. Its own cookies. Its own network stack. Nothing crosses between them.

What you get: numbers that match what your customers’ browsers would record at the same load. Web Vitals captured natively (LCP, INP, CLS, FCP). Third-party tag impact measured by definition, because the tags actually load and run. Per-session evidence: video, network log, console log, step playback.

What you do not get: API-layer load testing, protocol-level coverage (gRPC, MQTT, raw WebSocket), or the cheapest possible compute per virtual user.

When real-browser is the right call

The decision is usually about which question you need to answer.

  • “Will the page stay fast for users at peak?” Real-browser. HTTP-script will tell you the server stays fast. That is not the same question.
  • “Which third-party tag is costing us 600ms of LCP?” Real-browser. The tag has to run for you to measure it.
  • “What does the customer in Frankfurt actually see?” Real-browser. You need the rendering pipeline.
  • “Can our checkout survive 5,000 concurrent users?” Real-browser if the survival metric is user-visible (LCP, INP, completed sessions). HTTP-script if it is purely backend (database connections, error rate, queue depth).
  • “Can our /api/orders endpoint handle 50,000 RPS?” HTTP-script. There is no browser to run.

The honest answer for most teams is: both. Use HTTP-script tools against the API surface. Use real-browser tools against the customer-facing pages. They complement each other.

What forensic detail looks like

Real-browser load testing produces evidence per session. Every virtual user in the run has:

  • A full video of the browser viewport for the duration of the session.
  • A network log of every HTTP request the browser made, with timing breakdown.
  • A console log of every browser message, error, and warning.
  • Step-level pass/fail timestamps for the scripted journey.
  • Core Web Vitals captured as the browser saw them.

This matters when something goes wrong for a slice of users. If 14 sessions out of 42,000 failed at the checkout step, p99 will not show it. You need to open one of those 14 sessions and watch what happened. The HTTP-script equivalent is a request log; the shared-browser equivalent is an aggregate; the real-browser equivalent is the recorded session itself.

That is the structural advantage. The expensive part of debugging a production performance incident is finding the broken user. Real-browser load testing puts that user, with their full session, on the same page as the aggregate metric that surfaced the problem.

What it does not replace

HTTP-script tools remain the right answer for:

  • Pure API load tests (REST, gRPC, GraphQL, WebSocket).
  • Protocol-level coverage (MQTT, AMQP, JDBC, JMS, SOAP).
  • Very high concurrency on a tight budget when the user-facing detail is not required.
  • Existing investments in mature HTTP-script test suites that work well for your team.

Most teams arrive at the same answer: keep the HTTP-script suite for the API surface, add real-browser testing for the customer journey. The two report on different layers of the stack.

For a side-by-side comparison with a popular HTTP-script tool, see Evaluat vs k6. For methodology on measuring Vitals at load, see Core Web Vitals load testing.

Common questions

FAQ

Is real-browser load testing more expensive than HTTP-script load testing?

Yes, per virtual user. Running a real browser costs more than firing HTTP requests, by roughly an order of magnitude in compute. The trade-off is that you get information HTTP scripts structurally cannot produce: Core Web Vitals under load, third-party tag impact, per-session video and console output. The question is which set of information you need.

Can I do real-browser load testing with open-source tools?

Playwright and Puppeteer drive browsers programmatically. You can build a load-testing harness around them. Most teams who try this discover the operational cost (browser pool management, video capture pipelines, log aggregation, report rendering) outweighs the licensing saving on a hosted platform. k6 has an experimental browser module too, but it is not the main thing k6 is built for.

Does real-browser load testing scale to tens of thousands of users?

It does, on enough infrastructure. A single load generator can run roughly 50 to 200 concurrent real browsers depending on memory. Hosted platforms scale horizontally across many generators. If your test target is 10,000 concurrent users, that is realistic. If it is 1 million, real-browser testing is the wrong tool below the API layer.

Do I still need HTTP-script load testing if I have real-browser?

For pure API endpoints, yes. REST, gRPC, WebSocket, MQTT load tests have no browser to run, so the cost overhead of real-browser is wasted. Most teams end up running k6 or JMeter against the API surface and a real-browser tool against the customer-facing pages. The two test different layers.

See it on your site

Test in real browsers.
Debug in real sessions.

Want to see this measured on your app?

30 minutes. We build a scenario on your real customer journey, run a small test, and walk you through the report with your data in it.

Sample report walkthrough
30s video · 16:9