Evaluat is in private access. Demos open through July. Book a slot

Blog Guides & best practices

Playwright for performance testing: can a browser automation tool drive virtual users?

You already know Playwright for end-to-end tests. Can you reuse it for performance testing and call each browser a virtual user? You can, but a real browser is expensive to run, so it drives a handful, not a flood. Here is how far Playwright scales, and where you reach for a different tool.

Written by: Evaluat Staff ·

The cost of one virtual user. A protocol virtual user in a load tool like k6 costs about 1 to 5 megabytes, so one machine runs tens of thousands of them, shown as a dense field of dots. A real-browser virtual user driven by Playwright costs hundreds of megabytes and about one CPU core, so one machine runs only dozens to low hundreds, shown as a few browser windows. Roughly 50 to 100 times more compute per user.

Can Playwright do performance testing?

Yes for one thing and no for another. Playwright can measure performance: because it drives a real browser, every run records what a user actually experiences, including Core Web Vitals. What it cannot do on its own is generate load. Playwright has no concept of a virtual user, so simulating concurrent traffic takes extra tooling.

Start with the tool itself. Playwright is an open-source framework from Microsoft for driving a browser programmatically. Its own documentation calls it “an end-to-end test framework for modern web apps”: you write a script, and Playwright opens Chromium, Firefox, or WebKit and clicks, types, and navigates the way a person would. The documentation never mentions load testing or virtual users, because that was never its job.

Two terms are worth pinning down. Performance testing is the umbrella for measuring how fast a system responds. Load testing is one kind of performance test: you apply concurrent traffic and watch what happens to speed and stability as it climbs. The traffic in a load test is made of virtual users, each one a simulated visitor the tool controls.

Virtual users come in two kinds, and the difference decides what Playwright can and cannot do. A protocol virtual user fires HTTP requests straight at the server and measures the reply. A real-browser virtual user runs an actual browser and measures what renders. That distinction is the whole subject of real-browser load testing; here it is enough to know that Playwright, by driving a real browser, sits firmly on the real-browser side.

That is also what makes it useful for performance work. Point Playwright at a page and it reads the same metrics the browser exposes to any user: Core Web Vitals like Largest Contentful Paint and Interaction to Next Paint, navigation timing, and every network request and console message. That is real performance data from a real rendering engine, which a request-only tool cannot produce.

Why can’t you run ten thousand Playwright browsers?

Because each Playwright virtual user is a whole browser, and a browser is heavy. A protocol load tool spends a few megabytes per virtual user; a real browser spends hundreds of megabytes and close to a full CPU core. That is roughly fifty to a hundred times more compute per user, so the same machine runs far fewer of them.

Put numbers on the cheap side first. Grafana’s k6, a popular protocol load tool, reports that, as of 2026, a single instance can drive 30,000 to 40,000 virtual users and up to 300,000 requests per second, because a simple protocol virtual user costs only about 1 to 5MB of memory. At that price you can hold tens of thousands of them in RAM on one box.

A real browser is a different order of cost. Independent measurements put a headless Chromium instance at roughly 50 to 150MB before it loads a page, climbing into the hundreds of megabytes once it renders. On top of memory it needs CPU to parse HTML, execute JavaScript, and paint. A widely used rule of thumb is about one CPU core per concurrent browser.

The strain shows up fast in practice. In one reported case on Grafana’s forum, a k6 browser test ran cleanly at 5 virtual users but began throwing errors past 20 on an 8-core, 64GB machine, with the browsers alone consuming over 3GB of memory and pinning the CPU. That is the wall every browser-based load test eventually meets: not a licensing limit, a hardware one.

Do the division and the ceiling is clear. A generator that holds tens of thousands of protocol users holds dozens to low hundreds of real browsers, and the people who build browser-based load tests agree: a pool of 50 to 100 concurrent sessions is usually enough to characterize the experience. Run more than the machine can feed and the browsers starve, which distorts the very metrics you came to measure.

Protocol virtual userReal-browser virtual user
What it isAn HTTP client firing requestsA full browser (Chromium, Firefox, WebKit)
Memory per user~1-5MBHundreds of MB
CPU per userA fraction of a core~1 core
Users per generatorTens of thousandsDozens to low hundreds
What it measuresServer responseWhat the page renders

How do you load test with Playwright in practice?

You pair Playwright with an orchestrator that turns one script into many concurrent virtual users. Artillery is the common choice: its Playwright engine spawns the browsers, ramps the traffic, and aggregates the results. Grafana k6 ships a similar browser mode. For more load than one machine holds, you run the same test across a fleet of cloud workers.

Artillery and Playwright

Playwright on its own has no virtual users, no ramp, and no aggregated report; it runs one script in one browser. Artillery, an open-source load testing toolkit, fills that gap with a Playwright engine that, in its own words, “takes care of setting up headless browsers, running your Playwright test code, and collecting and emitting performance metrics.” You define traffic phases, for example ramp from 0 to 100 virtual users over five minutes, and Artillery launches a browser per virtual user, replays your Playwright steps in each, and reports Web Vitals (LCP, CLS, INP, TTFB, FCP, FID) as min, max, mean, p95, and p99.

There is a fidelity trade in how it does that. By default, as of 2026, Artillery gives each virtual user its own browser context rather than a separate browser. A context is an isolated session (its own cookies, cache, and storage) inside a shared browser process, so it is cheaper than a full browser but not as independent: many contexts share one process and its CPU. Artillery warns that a full browser per user “will require a lot more CPU and memory and is not recommended for most tests,” and it records traces for only five virtual users at a time by default. Those defaults exist precisely because real browsers are expensive.

k6 browser mode

Grafana’s k6 reaches the same place from the protocol side. Its browser module drives a real Chromium-based browser from a k6 script and reports Core Web Vitals such as LCP, CLS, FCP, and TTFB per page. It is not Playwright, but it is the same idea: a real rendering engine standing in for a user. If your team already runs k6 for API load, its browser mode adds a browser layer without a second tool.

Scaling the browser fleet

One generator caps out at dozens to low hundreds of browsers, so larger browser-based tests run horizontally: the same script on many workers, often on cloud container services, with the results merged. This is how a browser-based test reaches thousands of concurrent users. It works, but you are now operating a distributed browser fleet, with all the pooling, scheduling, and log collection that implies.

A concrete shape makes the sizing real. Say you want to know whether checkout stays fast at 100 concurrent users. You write the journey in Playwright (open the product page, add to cart, check out), wrap it in an Artillery config that ramps to 100 virtual users over 10 minutes, and read the p95 Largest Contentful Paint and INP per step. At roughly one core per browser, 100 browsers is well past what an 8-core machine feeds comfortably, so in practice you size the generator up or split the run across workers. That math is the part teams underestimate.

Playwright, protocol tools, or a real-browser platform: which fits?

Match the tool to the question. For functional checks (does the flow work?), use Playwright by itself. For raw volume against an API, use a protocol tool like k6 or JMeter. For how the page feels under load, use real browsers as virtual users. Most teams need a mix, weighted toward protocol traffic.

Playwright aloneProtocol load tool (k6, JMeter)Real-browser platform
DrivesOne real browserHTTP and protocol requestsOne isolated browser per virtual user
MeasuresOne session’s experienceServer response, throughput, errorsExperience under load, per session
Virtual usersNone built inTens of thousands per machineDozens to low hundreds, scaled out
Per-session forensicsManual traces you wire upNoneBuilt in (video, network, console)
Setup costLow (it is one script)Low to moderateManaged
Best forFunctional end-to-end testsAPI and protocol volumeUser-facing journeys at load

The vendors say the same thing in their own docs. Grafana’s k6 docs recommend, as of 2026, a hybrid where browser virtual users are 10% or less of the load and protocol virtual users carry the other 90%: the protocol layer generates the volume cheaply, while a thin slice of real browsers watches the experience. That ratio is a sound default. You are not choosing one model; you are deciding how much of each to run.

Where does Playwright land in that picture? On its own it is a functional testing tool, and a good one. Bolted to an orchestrator it becomes the browser slice of a hybrid load test. And if that browser slice is the part you care about most, a platform built for it can be less work than assembling Playwright, an orchestrator, tracing, and a reporting pipeline yourself. For a protocol tool set beside the real-browser model, see Evaluat vs k6.

Common mistakes when load testing with Playwright

The recurring error is treating Playwright as a volume generator. It is a precision instrument for browser experience, not a firehose. The mistakes below come from pushing it past that role or from mis-sizing the rig, and each one quietly corrupts the numbers you are trying to trust.

  • Using it for raw concurrency. Playwright shines at a few hundred realistic sessions, not at tens of thousands of hits. If you need volume, generate it with a protocol tool and keep Playwright for the experience layer.
  • Packing too many sessions into one browser. Running many contexts or tabs in a single browser process is cheaper, but they share one CPU and one crash domain, so the contention stops resembling independent users. The Vitals you measure drift away from what real, separate browsers would record.
  • Reusing the same data for every user. If every virtual user logs in as the same account and searches the same term, your server caches the result and the test flatters itself. Give each user unique data so the load hits cold paths the way real traffic does.
  • Under-provisioning the generator. At roughly one core per browser, a handful of cores cannot feed a hundred browsers. An overloaded rig slows the browsers themselves, and you end up measuring your test machine instead of your site.
  • Reading only the server’s numbers. The reason to drive a real browser is to see what the server cannot report. If you then judge the test by response time alone, you have paid for browsers and thrown away what they captured.

Where Evaluat fits

Evaluat is a real-browser performance testing platform built for exactly the slice Playwright cannot scale on its own. It runs each virtual user in its own isolated browser and captures Core Web Vitals under load, keeping the evidence for every user. Every virtual user is a real browser, so the numbers are what users see, not what scripts pretend.

In practice you build a journey once in a visual scenario editor, with no Playwright script to maintain, then run it at the concurrency you expect from London or Frankfurt. Each run reports LCP, INP, CLS, and First Contentful Paint per session and per URL, scored with Apdex, and keeps a video of every session, a network log of every request, and a console log of every message. When a session stalls at peak, you open it and watch the session that broke, instead of inferring it from a percentile.

The scoping here is the same line this article has drawn throughout. For raw API and protocol volume, a tool like k6 or JMeter is the right and cheaper instrument, and a hybrid that pairs it with real browsers is often the strongest test of all. For purely functional end-to-end checks, Playwright by itself is the right tool. Evaluat is for the question in between: what real browsers experience when the traffic is real.

So, can a browser automation tool drive virtual users? Yes, a real and useful handful of them, enough to measure what your pages feel like under load, as long as a protocol tool carries the volume and you size the rig for the browsers you run. Decide how much of each layer your question needs, and put a real browser on the journey that carries your revenue.

Test in real browsers. Debug in real sessions. Book a demo.

Common questions

FAQ

Can Playwright do load testing?

Not on its own, but yes with help. Playwright drives one real browser and has no concept of a virtual user, so to generate concurrent load you pair it with an orchestrator like Artillery, which launches many browsers and ramps the traffic. Because each browser is resource heavy, this works best for a few dozen to a few hundred sessions rather than tens of thousands.

How many virtual users can Playwright simulate?

On a single machine, dozens to low hundreds, because each browser needs roughly one CPU core and hundreds of megabytes of memory. A common practical target is 50 to 100 concurrent sessions, which is usually enough to characterize the experience. To go higher you run the same test across a fleet of cloud workers and merge the results.

Is Playwright a performance testing tool?

Playwright is an end-to-end testing framework, but it can measure performance because it drives a real browser. It records Core Web Vitals, navigation timing, and network and console activity for a session. It does not generate load by itself, so for load testing it is one piece of a larger setup rather than a complete tool.

Can Playwright measure Core Web Vitals?

Yes. Because Playwright runs a real browser, it can read the same Core Web Vitals the browser exposes to any user, including Largest Contentful Paint and Interaction to Next Paint. When you run it through Artillery, those metrics are captured and aggregated automatically across all virtual users as percentiles.

Playwright vs k6 for load testing, which should I use?

Use k6 when you need volume against an API or server, because a protocol tool runs tens of thousands of virtual users cheaply. Use Playwright, through Artillery, when you need to measure what the browser renders under load. They are not exclusive: k6 even ships a browser mode, and the common pattern is protocol traffic for volume plus a small share of real browsers for experience.

Do I still need a protocol load tool if I use Playwright?

For API and protocol endpoints, yes. A REST, gRPC, or WebSocket test has no page to render, so a real browser is wasted overhead there. The recommended pattern is a hybrid: a protocol tool generates most of the load while a small share of browser virtual users, around 10 percent, measures the front-end experience.

Why are real browsers so resource-heavy for load testing?

A real browser is a full runtime that parses HTML, executes JavaScript, and paints the page, so each one needs hundreds of megabytes of memory and close to a CPU core. A protocol virtual user, by contrast, is just an HTTP client costing a few megabytes. That is why one machine runs tens of thousands of protocol users but only dozens to low hundreds of real browsers.

See it on your site

Test in real browsers.
Debug in real sessions.

Want to see this measured on your app?

30 minutes. We build a scenario on your real customer journey, run a small test, and walk you through the report with your data in it.