Conference Proceedings
- Home
- Why I ditched cloud transcoding and set up a mac mini render farm
Why I ditched cloud transcoding and set up a mac mini render farm
Description
“There are a number of use cases where you need to tightly integrate video processing with some form of AI processing on the video, like AI Upscaling, real-time lip syncing or gaze-correction, or virtual backgrounds. The killer bottleneck for these tasks is almost always CPU to GPU communication, making them impractical or expensive for real time applications.
The normal approach is setting up a fully GPU pipeline, using GPU video decoder, building your AI processing directly with CUDA (and potentially using custom CUDA kernals) instead of abtracted libraries like Pytorch – it’s doable, we did this at Streamyard, but it’s very hard to do and high maintenance.
I stumbled upon a ridiculous sounding, but actually plausible alternative – writing neural networks in WebGPU, and then using WebCodecs to decode/encode frames, and use WebGPU to directly access and manipulate frame data. When a pipeline like this is run on an Apple M4 chip, it can run AI video processing loads 4-5x faster than an equivalent pipeline on a GPU-enabled instance on a cloud provider, and it’s also ~5x cheaper to rent an M4 than an entry-level GPU.
I’ve done a number of tests, and my conclusion is that this is efficient is because the M4 chip (1) has optimizations for encoding in it’s hardware, (2) has GPU to to CPU communication much more tightly integrated than a typical GPU + cloud vm setup.
All of these technologies are relatively new – the M4 chip, WebGPU, WebCodecs, but those three things together I think present viable alternative AI video processing pipeline compared to traditional processing.
Using browser based encoding makes it also much easier to integrate ‘simpler’ manipulations, like adding ‘banners’ in live streams, and it’s also much easier to integrate with WebRTC workflows. I have a tendency to write my own WebGPU neural networks, but TensorflowJS has improved in the last few years to accommodate importing video-frames from GPU. There are certainly downsides, like working with specialty Mac cloud providers, but it might actually be practical and easier to maintain (at least on the software side).
This talk was presented at Demuxed 2025 in London, a conference by and for engineers working in video. Every year we host a conference with lots of great new talks like this – learn more at https://demuxed.com”
Conference
Speakers
Learning Categories
Other Proceedings
Here are some other proceedings that you might find interesting.
What Codec Should I Use?
Alan Resnick
Doing Server-Side Ad Insertion on Live Sports for 25.3M Concurrent Users
Ashutosh Agrawal
Is now the time to solve the deepfake threat?
Roderick Hodgson
Super Resolution: The scaler of tomorrow, here today!
Nick Chadwick
The do's and don'ts about Streaming security
Javier Brines Garcia
Modeling the conceptual structure of FFmpeg in JavaScript
Ryan Harvey
Objectionable Uses of Objective Quality Metrics
Richard Fliam
RTMP: web video innovation or Web 1.0 hack… how did we get to now?
Sarah Allen
Large-Scale Media Archive Migration to the Cloud
Konstantin Wilms
HEVC Upload Experiments
Chris Ellsworth
Related Courses
Below are some courses that might interest you based on the learning categories and topic tags of this conference proceeding.
What Codec Should I Use?
Alan Resnick
Doing Server-Side Ad Insertion on Live Sports for 25.3M Concurrent Users
Ashutosh Agrawal
Is now the time to solve the deepfake threat?
Roderick Hodgson
Super Resolution: The scaler of tomorrow, here today!
Nick Chadwick
The do's and don'ts about Streaming security
Javier Brines Garcia
Modeling the conceptual structure of FFmpeg in JavaScript
Ryan Harvey
Objectionable Uses of Objective Quality Metrics
Richard Fliam
RTMP: web video innovation or Web 1.0 hack… how did we get to now?