The three load-testing models
Load testing tools fall into three architectural categories. They measure different things and they are not interchangeable.
HTTP-script load testing
The original model. The tool fires HTTP requests directly at the server, simulates user pacing in code, and measures server response time. k6, JMeter, Locust, Gatling, and Artillery are the well-known examples.
What you get: request-per-second throughput, response-time distributions, error rates, status code histograms. Cheap, fast, accurate for what it measures.
What you do not get: anything that happens in a browser. No HTML parsing, no JavaScript execution, no rendering. Web Vitals are unmeasured because the browser does not exist in the test. Third-party tags, A/B tests, consent banners, and analytics SDKs are invisible because they never run.
Shared-browser load testing
A middle ground. One browser process handles many simulated users by running multiple scenarios in parallel within the same Chromium instance. Lighter on infrastructure than the alternative.
What you get: some browser activity. JavaScript runs. The DOM renders. You can capture screenshots.
What you do not get: realistic contention. A single browser doing 100 things in a row has nothing to do with 100 independent browsers doing one thing each. The CPU profile is wrong. The memory profile is wrong. The cache behaviour is wrong. The Web Vitals numbers that come out are optimistic in ways that production rarely respects.
Treat shared-browser tools as functional check harnesses, not as load-testing instruments.
Real-browser load testing
Each virtual user runs in its own isolated browser. Its own memory. Its own CPU allocation. Its own cache. Its own cookies. Its own network stack. Nothing crosses between them.
What you get: numbers that match what your customers’ browsers would record at the same load. Web Vitals captured natively (LCP, INP, CLS, FCP). Third-party tag impact measured by definition, because the tags actually load and run. Per-session evidence: video, network log, console log, step playback.
What you do not get: API-layer load testing, protocol-level coverage (gRPC, MQTT, raw WebSocket), or the cheapest possible compute per virtual user.
When real-browser is the right call
The decision is usually about which question you need to answer.
- “Will the page stay fast for users at peak?” Real-browser. HTTP-script will tell you the server stays fast. That is not the same question.
- “Which third-party tag is costing us 600ms of LCP?” Real-browser. The tag has to run for you to measure it.
- “What does the customer in Frankfurt actually see?” Real-browser. You need the rendering pipeline.
- “Can our checkout survive 5,000 concurrent users?” Real-browser if the survival metric is user-visible (LCP, INP, completed sessions). HTTP-script if it is purely backend (database connections, error rate, queue depth).
- “Can our /api/orders endpoint handle 50,000 RPS?” HTTP-script. There is no browser to run.
The honest answer for most teams is: both. Use HTTP-script tools against the API surface. Use real-browser tools against the customer-facing pages. They complement each other.
What forensic detail looks like
Real-browser load testing produces evidence per session. Every virtual user in the run has:
- A full video of the browser viewport for the duration of the session.
- A network log of every HTTP request the browser made, with timing breakdown.
- A console log of every browser message, error, and warning.
- Step-level pass/fail timestamps for the scripted journey.
- Core Web Vitals captured as the browser saw them.
This matters when something goes wrong for a slice of users. If 14 sessions out of 42,000 failed at the checkout step, p99 will not show it. You need to open one of those 14 sessions and watch what happened. The HTTP-script equivalent is a request log; the shared-browser equivalent is an aggregate; the real-browser equivalent is the recorded session itself.
That is the structural advantage. The expensive part of debugging a production performance incident is finding the broken user. Real-browser load testing puts that user, with their full session, on the same page as the aggregate metric that surfaced the problem.
What it does not replace
HTTP-script tools remain the right answer for:
- Pure API load tests (REST, gRPC, GraphQL, WebSocket).
- Protocol-level coverage (MQTT, AMQP, JDBC, JMS, SOAP).
- Very high concurrency on a tight budget when the user-facing detail is not required.
- Existing investments in mature HTTP-script test suites that work well for your team.
Most teams arrive at the same answer: keep the HTTP-script suite for the API surface, add real-browser testing for the customer journey. The two report on different layers of the stack.
For a side-by-side comparison with a popular HTTP-script tool, see Evaluat vs k6. For methodology on measuring Vitals at load, see Core Web Vitals load testing.