Effective Per-Title Encoding For UGC Videos Using Machine Learning

Conference Proceedings

Home
Effective Per-Title Encoding For UGC Videos Using Machine Learning

Description

Per-title encoding aims to achieve the best visual quality subject to a predefined maximum bitrate constraint for any arbitrary video content, which was first proposed by Netflix[1]. Ideally, the quality-bitrate convex hull of a given video should be obtained, by encoding the video with typical (bitrate, resolution) ladders and drawing the respective resulting quality-bitrate curves.

For UGC (User Generated Content) videos, it is not practical to obtain the convex hull of every single video as the volume of UGC to process usually is extraordinarily huge. Meanwhile, individual UGC videos have to be processed sufficiently fast. Hence, how to derive a per-title-like approach for UGC has become a challenging but fairly attractive research topic. Quite a few state-of-the-art approaches have been proposed, in particular featured by the use of machine learning. In this talk, we first outline the typical UGC per-tile issue as below: 1. A set of (resolution, bitrate) ladders are predefined; 2. A maximum bitrate that indicates the real-time bandwidth constraint is specified; 3. It is needed to decide (a) which resolution to be chosen, and (b) which CRF value should be configured for an encoder, in order to have the encoding bitrate satisfy the maximum bitrate constraint while achieving the best possible visual quality. We have practiced the following approach for UGC per-title-like encoding: Step 1: Extract spatial / temporal features for a given UGC video. Step 2: Pre-train a machine-learning model to map the extracted spatial/temporal features from Step 1 to the triplet (bitrate, VMAF, CRF) for all predefined resolutions. Step 3: For a given maximum bitrate constraint, based on the predefined (bitrate, resolution) ladders, exploit the machine learning model to predict the chosen resolution and the encoder CRF parameter. Using the above approach, we may effectively resolve the following during multi-rendition adaptive encoding/transcoding: (1) The VMAF-bitrate curve usually will level off when bitrate increases to a certain level. Sometimes to achieve a too high VMAF score, an unnecessary large bitrate has been used. The VMAF score actually can be lowered to a certain extent while a much lower bitrate may be produced. (2) The predefined (bitrate, resolution) ladders may be over-defined for a certain video category, which means too many ladders have been pre-defined For certain UGC video categories, some (bitrate, resolution) ladders may be removed in advance, which will help significantly speed up the per-title processing. Overall, we will demonstrate that machine learning based per-title can be efficiently and effectively applied to UGC videos. It not only achieves a more ideal visual experience while at lower bitrate, but also can be processed at fairly low computational complexity. Reference: [1] Anne Aaron, Zhi Li, Megha Manohara, Jan De Cock and David Ronca, “”Per-Title Encode Optimization””, Originally published at techblog.netflix.com on December 14, 2015. This talk was presented at Demuxed ’22, a conference for video nerds in San Francisco featuring amazing talks like this one. Demuxed ’22 was made possible by sponsors like our Platinum sponsor Daily (https://daily.co) and organized by people from Mux (https://mux.com). For more information about the conference and community, see https://2022.demuxed.com.