Skip to content
  • SVTA University Calendar
  • Courses
    • In-Person Training
  • Hot Topics
  • Education Resources
    • Conferences
      • Demuxed
      • Mile High Video
      • NAB Streaming Summit
      • SEGMENTS
      • Streaming Tech Sweden
    • Industry Resources
    • Media Samples
    • SVTA Webinars
  • Instructors
  • Register
  • Log In
  • SVTA University Calendar
  • Courses
    • In-Person Training
  • Hot Topics
  • Education Resources
    • Conferences
      • Demuxed
      • Mile High Video
      • NAB Streaming Summit
      • SEGMENTS
      • Streaming Tech Sweden
    • Industry Resources
    • Media Samples
    • SVTA Webinars
  • Instructors
  • Register
  • Log In
$0.00 0 Cart

Conference Proceedings

  • Home
  • Why Video Captioning Needs Built-In Viewer Feedback in 2022 (and How We Do It)
Why Video Captioning Needs Built-In Viewer Feedback in 2022 (and How We Do It)

Description

We’ve all heard the hype, excitement, and fear of how AI systems are getting smarter & smarter, developing sentience, and generally taking over the world creating a future of subjugation and despair for the human race. However, I am fairly confident that this bleak picture is not in our near future because there is one major problem: AI doesn’t even understand us that well.

Anyone that has used voice recognition on their phone or in their car will recognize that speech-to-text technology still has a long way to go. In the video world, this is nowhere more obvious than in auto-generated video captioning. While auto-generated captions are better than no captions at all – incorrect spellings, wrong words, bad punctuation, and misplaced phrasing breaks among other discrepancies mean that human review and improvement is still needed for captions to accurately represent what is said and heard in videos. (If you’re watching a video for any long length of time and are not noticing any errors, that is thanks to human review!) Accuracy in captioning is not a trivial matter since captioning errors are not just a minor annoyance. ADA accessibility compliance demands 99% accurate captions, speaker labels, and phrase breaks among other features that none of the auto-generated captioning services on the market today meet. Yet, most auto-generated caption errors can be improved by far more people than only costly transcribers. That’s why I propose that while the speech recognition wizards keep improving their methods and services, it is on us video engineers to allow interested viewers, those who are already watching and interested in fixing errors they see, the chance to easily give feedback to improve transcriptions of both recorded and live video. The goal of this is to increase accuracy and watchability for fellow viewers while also giving the machines better and better data to keep on improving. In this talk, I will give a brief review of current speech-to-text technology, where it is limited, and why it will be limited until completely new techniques come along. Then I will outline both high-level ideas and actionable steps for video developers to add more feedback systems into their video players. This includes a demo that proposes updates to video player UIs for viewers to be able to easily give feedback, a backend that handles the inputs of an open-ended crowdsourced system in a productive manner, and updates to the caption file formats we use to capture this feedback effectively. One day in the future, every video will be captioned 100% correctly by automation. But until that day, its on us to incorporate simple feedback systems so that every video has the chance to be captioned correctly! This talk was presented at Demuxed ’22, a conference for video nerds in San Francisco featuring amazing talks like this one. Demuxed ’22 was made possible by sponsors like our Platinum sponsor Daily (https://daily.co) and organized by people from Mux (https://mux.com). For more information about the conference and community, see https://2022.demuxed.com.

Conference

Demuxed 2022

Speakers

Leon Lyakovetsky

Learning Categories

Metadata
Accessability
Closed Captions
Subtitles

Other Proceedings

Here are some other proceedings that you might find interesting.

What Codec Should I Use?

Alan Resnick

Doing Server-Side Ad Insertion on Live Sports for 25.3M Concurrent Users

Ashutosh Agrawal

Is now the time to solve the deepfake threat?

Roderick Hodgson

Super Resolution: The scaler of tomorrow, here today!

Nick Chadwick

The do's and don'ts about Streaming security

Javier Brines Garcia

Modeling the conceptual structure of FFmpeg in JavaScript

Ryan Harvey

Objectionable Uses of Objective Quality Metrics

Richard Fliam

RTMP: web video innovation or Web 1.0 hack… how did we get to now?

Sarah Allen

Large-Scale Media Archive Migration to the Cloud

Konstantin Wilms

HEVC Upload Experiments

Chris Ellsworth

Related Courses

Below are some courses that might interest you based on the learning categories and topic tags of this conference proceeding.

What Codec Should I Use?

Alan Resnick

Doing Server-Side Ad Insertion on Live Sports for 25.3M Concurrent Users

Ashutosh Agrawal

Is now the time to solve the deepfake threat?

Roderick Hodgson

Super Resolution: The scaler of tomorrow, here today!

Nick Chadwick

The do's and don'ts about Streaming security

Javier Brines Garcia

Modeling the conceptual structure of FFmpeg in JavaScript

Ryan Harvey

Objectionable Uses of Objective Quality Metrics

Richard Fliam

RTMP: web video innovation or Web 1.0 hack… how did we get to now?

Sarah Allen

Large-Scale Media Archive Migration to the Cloud

Konstantin Wilms

HEVC Upload Experiments

Chris Ellsworth

Follow

Twitter Linkedin-in

User Area

  • Account
  • FAQs
  • Orders
  • Registration
  • Account
  • FAQs
  • Orders
  • Registration

Resources

  • About
  • FAQs
  • Legal Hub
  • Support
  • How-To Take A Course
  • How-To Navigate the Interface
  • About
  • FAQs
  • Legal Hub
  • Support
  • How-To Take A Course
  • How-To Navigate the Interface

SVTA Sites

  • Diversity and Inclusion
  • LABS
  • OATC
  • Open Caching
  • SEGMENTS
  • Streaming Video Wiki
  • SVTA Fellows
  • SVTA University
  • Diversity and Inclusion
  • LABS
  • OATC
  • Open Caching
  • SEGMENTS
  • Streaming Video Wiki
  • SVTA Fellows
  • SVTA University

© Copyright Streaming Video Technology Alliance (SVTA).

About the SVTA University

The SVTA University (SVTAU) is an educational arm of the Streaming Video Technology Alliance, providing courses and other instructional content related to understanding and working with components within the streaming video stack.

About the SVTA

The Streaming Video Technology Alliance is a global technical association committed to bringing video streaming companies together to help build a better viewer experience at scale. Find out more at www.svta.org.

Payment Forms

Stay In-the-Know!

Enter your email address below to subscribe to our newsletter for the latest in available courses and other Institute news. Note that by doing so, you agree to our privacy policy.

Loading...

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.