An end-to-end learnable video codec that supports machine vision tasks in its base layer and input reconstruction for human viewing in its enhancement layer. It introduces a new paradigm for video coding that efficiently represents and compresses video for both machine and human use.