Skip to content
  • SVTA University Calendar
  • Courses
    • In-Person Training
  • Hot Topics
  • Education Resources
    • Conferences
      • Demuxed
      • Mile High Video
      • NAB Streaming Summit
      • SEGMENTS
      • Streaming Tech Sweden
    • Industry Resources
    • Media Samples
    • SVTA Webinars
  • Instructors
  • Register
  • Log In
  • SVTA University Calendar
  • Courses
    • In-Person Training
  • Hot Topics
  • Education Resources
    • Conferences
      • Demuxed
      • Mile High Video
      • NAB Streaming Summit
      • SEGMENTS
      • Streaming Tech Sweden
    • Industry Resources
    • Media Samples
    • SVTA Webinars
  • Instructors
  • Register
  • Log In
$0.00 0 Cart

Conference Proceedings

  • Home
  • Content Aware Encoding for low latency live streaming encoders using deep learning
Content Aware Encoding for low latency live streaming encoders using deep learning

Description

Livestreaming has emerged as a captivating medium that is reshaping the way we engage, communicate, and entertain with the world. Platforms like Twitch, YouTube Live, Facebook Live and others have become go-to destinations for audiences seeking real-time experiences, immediate interaction, and the thrill of being part of a live event. Real-time transcoding plays a crucial role in delivering high-quality, compatible, and optimized video content to these audiences using a range of devices and platforms.

Most of this transcoding today, occurs using a fixed adaptive bitrate (ABR) ladder that has a predetermined set of bitrate/resolution combinations for each encoded stream. This static or one size fits all approach has the following limitations: content type blindness leading to inefficient use of bandwidth and suboptimal video quality (VQ). To overcome these limitations, Netflix pioneered content aware encoding (or Per-Shot encoding) for VOD use case. The video content is analyzed offline for every shot and efficient encoding decisions such as bitrate, resolution, quantization level is chosen based on convex hull of a given shot (best quality-bitrate points obtained from an ocean of encodes with different parameters) to maximize the VQ while minimizing bandwidth requirements. Live streaming does not have the luxury of “infinite” latency that VOD offers. Real-time transcoding at scale puts additional cap on the processing capacity. This is a challenging problem that has attracted quite a bit of research in recent times. There are several approaches to content aware encoding for low latency encoding. Finding the best possible quality-bitrate trade-off in real-time with available compute while maintaining latency is the name of this game. In this talk, we showcase our work using deep learning (DL) that predicts the “optimal” bitrate for incoming video in real time using data from input and encoder lookahead. We train a fully connected regression network using input statistics (luma histogram) and encoder lookahead statistics (SAD, mv and activity histograms). The ground truth for our purpose of achieving “optimal” bitrate is the bitrate which achieves a minimum VMAF value of 90 (this is the minimum quality bar) for each chosen shot during training. This regression network is very light on compute and can efficiently run without affecting real-time performance or density of the encoder. This network needs a minimum of four frames of lookahead data to produce a high prediction accuracy. We have trained our network to have maximum savings for low complexity content with negligible loss in video quality and bypass very high complexity content. We tested this algorithm using a variety of video clips downloaded from Twitch.tv and here are some of our results: 1. Bitrate savings of more than 30% with less than 1 VMAF point degradation for easy content such as talking head and low complexity content. 2. Bitrate savings of 9% on average for medium complexity content with less than 1 VMAF point degradation. 3. Negligible savings for high complexity content (as the algorithm knows lowering bitrate would cause VQ degradation). The reasons for using a deep learning-based approach to predict CAE bitrate over traditional approaches are twofold: 1. Nonlinear function produced from DL provides precise bitrate savings without degrading VQ. 2. DL models can be trained/retrained by the content distributor using propriety and specific set to maximize bitrate savings while maintaining high VQ. This approach is applicable for both hardware and software-based encoders which have access to the encoder lookahead statistics mentioned above. These bitrate savings can provide substantial savings on CDN bandwidth and storage costs for content distributors. This talk was presented at Demuxed ’23, a conference for video nerds in San Francisco featuring amazing talks like this one.

Conference

Demuxed 2023

Speakers

Ramdas Satyan

Learning Categories

Low Latency
ABR
CAE
VMAF

Other Proceedings

Here are some other proceedings that you might find interesting.

What Codec Should I Use?

Alan Resnick

Doing Server-Side Ad Insertion on Live Sports for 25.3M Concurrent Users

Ashutosh Agrawal

Is now the time to solve the deepfake threat?

Roderick Hodgson

Super Resolution: The scaler of tomorrow, here today!

Nick Chadwick

The do's and don'ts about Streaming security

Javier Brines Garcia

Modeling the conceptual structure of FFmpeg in JavaScript

Ryan Harvey

Objectionable Uses of Objective Quality Metrics

Richard Fliam

RTMP: web video innovation or Web 1.0 hack… how did we get to now?

Sarah Allen

Large-Scale Media Archive Migration to the Cloud

Konstantin Wilms

HEVC Upload Experiments

Chris Ellsworth

Related Courses

Below are some courses that might interest you based on the learning categories and topic tags of this conference proceeding.

What Codec Should I Use?

Alan Resnick

Doing Server-Side Ad Insertion on Live Sports for 25.3M Concurrent Users

Ashutosh Agrawal

Is now the time to solve the deepfake threat?

Roderick Hodgson

Super Resolution: The scaler of tomorrow, here today!

Nick Chadwick

The do's and don'ts about Streaming security

Javier Brines Garcia

Modeling the conceptual structure of FFmpeg in JavaScript

Ryan Harvey

Objectionable Uses of Objective Quality Metrics

Richard Fliam

RTMP: web video innovation or Web 1.0 hack… how did we get to now?

Sarah Allen

Large-Scale Media Archive Migration to the Cloud

Konstantin Wilms

HEVC Upload Experiments

Chris Ellsworth

Follow

Twitter Linkedin-in

User Area

  • Account
  • FAQs
  • Orders
  • Registration
  • Account
  • FAQs
  • Orders
  • Registration

Resources

  • About
  • FAQs
  • Legal Hub
  • Support
  • How-To Take A Course
  • How-To Navigate the Interface
  • About
  • FAQs
  • Legal Hub
  • Support
  • How-To Take A Course
  • How-To Navigate the Interface

SVTA Sites

  • Diversity and Inclusion
  • LABS
  • OATC
  • Open Caching
  • SEGMENTS
  • Streaming Video Wiki
  • SVTA Fellows
  • SVTA University
  • Diversity and Inclusion
  • LABS
  • OATC
  • Open Caching
  • SEGMENTS
  • Streaming Video Wiki
  • SVTA Fellows
  • SVTA University

© Copyright Streaming Video Technology Alliance (SVTA).

About the SVTA University

The SVTA University (SVTAU) is an educational arm of the Streaming Video Technology Alliance, providing courses and other instructional content related to understanding and working with components within the streaming video stack.

About the SVTA

The Streaming Video Technology Alliance is a global technical association committed to bringing video streaming companies together to help build a better viewer experience at scale. Find out more at www.svta.org.

Payment Forms

Stay In-the-Know!

Enter your email address below to subscribe to our newsletter for the latest in available courses and other Institute news. Note that by doing so, you agree to our privacy policy.

Loading...

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.