Hitting the CPU wall: why browser compositing broke at 1080p

This is the post I didn't want to write. It's about the day the core assumption of Conductor's architecture collapsed under its own weight — and what we learned rebuilding it.

The assumption

Conductor's compositor is a canvas element. Every video source — guest cameras, screen shares, graphics — gets drawn onto a 2D canvas at 30 frames per second. The canvas produces a MediaStream via captureStream(30), and that stream goes to the IVS Web Broadcast SDK for delivery to viewers.

This architecture is elegant. The browser handles compositing, layout, scene switching, and graphics rendering. It's the same rendering engine that draws every website you visit, repurposed to draw a live broadcast. No plugins. No native code. No external compositor.

At 720p, it works beautifully. At 1080p30, it falls apart.

The symptoms

During an end-to-end stream test, we opened the IVS console and saw something alarming: Starvation Start events firing every few seconds. The average bitrate was 95 kbps — roughly the quality of a security camera from 2005. The framerate was erratic, jumping between 8 and 24 fps. The watch page showed a permanent "Buffering..." overlay.

This was on an M3 Max MacBook Pro with 64GB of RAM. Not exactly a weak machine.

We opened chrome://webrtc-internals and found the smoking gun: powerEfficientEncoder: false. The IVS Web Broadcast SDK was using Chrome's software H.264 encoder — not the M3's dedicated hardware media engine. On macOS, Chrome's WebRTC implementation defaults to software encoding even when hardware encoding is available. This is a known issue with no clean workaround.

Understanding the CPU budget

A browser tab running Conductor's director view is doing an extraordinary amount of work simultaneously:

Daily.co WebRTC — receiving and decoding multiple participant video streams. Each guest is a separate WebRTC peer connection with its own decode pipeline. Five guests means five simultaneous video decodes.

Canvas compositor — drawing every video frame, every graphic, every overlay at 30fps. Each frame involves reading pixels from multiple video elements, transforming them (scale, position, crop), and writing the composited result to the output canvas. At 1920×1080, that's over 2 million pixels per frame, 30 times per second.

IVS SDK encoder — taking the canvas output stream and encoding it to H.264 in software. Video encoding is one of the most CPU-intensive operations in computing. At 1080p30, the encoder needs to compress 60 megapixels per second into a 2-4 Mbps bitstream. In software. On the same CPU that's doing everything else.

React UI rendering — the director interface itself. Signal health panel, guest grid, sidebar tabs, rundown timer. Every state update triggers a React render cycle.

These four workloads compete for the same CPU cores. When the encoder falls behind, Chrome's WebRTC implementation declares starvation and drops quality to catch up. The bitrate plummets. The framerate stutters. The viewer sees garbage.

The quick fix: drop resolution

The immediate mitigation was obvious: reduce the load. We changed the canvas compositor from 1920×1080 to 1280×720, and dropped the frame rate from 30 to 24.

// Before
width = 1920, height = 1080, fps = 30
// 62.2 million pixels per second

// After
width = 1280, height = 720, fps = 24
// 22.1 million pixels per second

720p24 uses roughly one-third the CPU of 1080p30. The starvation events stopped immediately. The bitrate climbed from 95 kbps to 600+ kbps. The framerate stabilized at 24-27 fps. The watch page stopped buffering.

But 720p24 is not broadcast quality. It's acceptable for a demo. It's not acceptable for a product that claims to deliver broadcast quality production from the browser.

Why vMix doesn't have this problem

We spent a lot of time studying how vMix and Wirecast solved this. The answer was instructive.

vMix runs on Windows and uses DirectX for GPU-accelerated compositing. Every source — cameras, NDI streams, graphics — is a texture on the GPU. The compositor never touches the CPU for pixel work. The CPU handles logic and control. The GPU handles every pixel operation: scaling, positioning, blending, transitions, and encoding. vMix can run 40+ sources at 4K60 on commodity hardware because the GPU is doing the heavy lifting.

Wirecast does the same thing on both Windows and macOS using Metal and DirectX respectively.

The browser canvas compositor does the opposite. It's all CPU. There is no GPU path for canvas.drawImage() in Chrome's 2D rendering context that feeds directly into a hardware encoder. The pixels go from GPU (where the video frames are decoded) to CPU (where the canvas composites them) and back to CPU (where the software encoder compresses them). Two unnecessary trips through the CPU.

The browser is not the right place to composite broadcast video. It's the right place to control the compositor. Those are two different things.

The real fix: server-side GPU compositing

The lesson from vMix was clear: separate the control plane from the compositing plane. The browser should be the director's interface — handling UI, controls, scene selection, and graphics configuration. The actual pixel work should happen on a machine with a GPU.

This led us to Ant Media Server. Ant Media runs on EC2 instances (including GPU instances like g4dn.xlarge with NVIDIA T4 GPUs). It accepts WebRTC from the browser, composites multiple video streams server-side, encodes with hardware H.264, and outputs to multiple destinations: WebRTC to viewers for low-latency delivery, RTMP to IVS for HLS CDN delivery, and RTMP to YouTube/LinkedIn for multi-destination streaming.

The architecture becomes:

Browser (director UI + controls)
  → WebRTC individual streams
    → Ant Media Server (EC2 with GPU)
      → Hardware H.264 compositing + encoding
        → WebRTC to viewers (low-latency)
        → RTMP to IVS → HLS via CloudFront (scale)
        → RTMP to YouTube/LinkedIn (multi-destination)

The director's MacBook Air now does exactly what it should: run a web application. The compositing, encoding, and delivery happen on dedicated infrastructure that was designed for this workload. The M3's idle cores can focus on rendering a smooth UI instead of fighting with the encoder for CPU time.

Per-show infrastructure

The model we're building provisions a dedicated Ant Media instance for each show. When a director creates a show, an EC2 spot instance spins up with Ant Media pre-configured. When the show ends, the instance tears down. Cost per show on a c6i.2xlarge spot instance: roughly $0.10/hour. A two-hour show costs $0.20 in compute.

This is extraordinary economics. A $799/month Professional plan customer running 10 shows per month uses $2 in compositor infrastructure. The per-show cost is essentially zero at any reasonable scale.

And because each show gets its own instance, there's complete isolation. One show's encoder load never affects another show. One show's crash never takes down another show. Every production runs on dedicated infrastructure.

What we learned

The CPU wall taught us something important about building browser-native tools: the browser is the best interface layer ever built, but it's not a general-purpose computation environment. Using it for what it's good at — real-time UI, WebRTC communication, responsive controls — produces extraordinary results. Using it for what it's not good at — sustained heavy computation like video encoding — produces starvation events at 95 kbps.

The right architecture isn't "everything in the browser" or "nothing in the browser." It's knowing which parts belong where. The director interface belongs in the browser. The compositor belongs on a GPU. The delivery belongs on a CDN. Each piece runs where it runs best.

720p24 bought us time. Server-side GPU compositing buys us the future.

Next in this series: The three delivery paths: WebRTC, HLS, and multi-destination RTMP