Why I ditched cloud transcoding and set up a mac mini render farm

Conference Proceedings

Home
Why I ditched cloud transcoding and set up a mac mini render farm

Description

“There are a number of use cases where you need to tightly integrate video processing with some form of AI processing on the video, like AI Upscaling, real-time lip syncing or gaze-correction, or virtual backgrounds. The killer bottleneck for these tasks is almost always CPU to GPU communication, making them impractical or expensive for real time applications. The normal approach is setting up a fully GPU pipeline, using GPU video decoder, building your AI processing directly with CUDA (and potentially using custom CUDA kernals) instead of abtracted libraries like Pytorch – it’s doable, we did this at Streamyard, but it’s very hard to do and high maintenance. I stumbled upon a ridiculous sounding, but actually plausible alternative – writing neural networks in WebGPU, and then using WebCodecs to decode/encode frames, and use WebGPU to directly access and manipulate frame data. When a pipeline like this is run on an Apple M4 chip, it can run AI video processing loads 4-5x faster than an equivalent pipeline on a GPU-enabled instance on a cloud provider, and it’s also ~5x cheaper to rent an M4 than an entry-level GPU. I’ve done a number of tests, and my conclusion is that this is efficient is because the M4 chip (1) has optimizations for encoding in it’s hardware, (2) has GPU to to CPU communication much more tightly integrated than a typical GPU + cloud vm setup. All of these technologies are relatively new – the M4 chip, WebGPU, WebCodecs, but those three things together I think present viable alternative AI video processing pipeline compared to traditional processing. Using browser based encoding makes it also much easier to integrate ‘simpler’ manipulations, like adding ‘banners’ in live streams, and it’s also much easier to integrate with WebRTC workflows. I have a tendency to write my own WebGPU neural networks, but TensorflowJS has improved in the last few years to accommodate importing video-frames from GPU. There are certainly downsides, like working with specialty Mac cloud providers, but it might actually be practical and easier to maintain (at least on the software side). This talk was presented at Demuxed 2025 in London, a conference by and for engineers working in video. Every year we host a conference with lots of great new talks like this – learn more at https://demuxed.com”