Skip to content

EncodedVideoChunk

The EncodedVideoChunk class, the other main type in WebCodecs, repesents the compressed (or “encoded”) version of a single VideoFrame.

The EncodedVideoChunk contains binary data (the encoded VideoFrame) and some metadata, and there is a 1:1 correspondence between EncodedVideoChunk and VideoFrame objects - if you encode 100 VideoFrame objects, you should expect 100 EncodedVideoChunk objects from the encoder.

Unlike a VideoFrame, an EncodedVideoChunk objects can’t be directly rendered or displayed because the data is enocoded,but they can be directly read from, or written to video files (via muxing).

EncodedVideoChunks are not by themselves video files .

You can not just encode a bunch of video frames, store the chunks in a blob and call it a day.

// This will not work!
async function encodeVideo(frames: VideoFrame[]){
const chunks = <EncodedVideoChunk[]>await encodeFrames(<VideoFrame[]> frames);
return new Blob(chunks, {type: "video/mp4"}); //Not how this works
}

If you want to write your encoded video chunks to a video file, that requires an additional step called muxing, there are libraries that do this for you, we’ll get to those in the next section.

For now, keep in mind that WebCodecs focuses on just on codecs, and codecs means compression, so WebCodecs will only help you with transforming raw video data into compressed (encoded) video data and vice versa.

You might think “that’s annoying”, as if WebCodecs doesn’t provide a complete solution, but keep in mind that muxing and other utilities are easily implemented as 3rd party libraries. What a library can’t do is access hardware-accelerated video encoding or decoding without the browser’s help, and hardware acceleration is exactly what WebCodecs is helps with.


Also, WebCodecs is a low-level API so it’s intentionally minimal. Use MediaBunny for easy-mode.

When streaming video data, you don’t even need muxing or a video file; the EncodedVideoChunk is useful by itself as-is.

Consider the following mock example of streaming video a canvas in one worker to another. Here we are rendering an animation in the source worker, sending raw VideoFrame objects to the the destination worker and then rendering the raw VideoFrame on the destination canvas.

Source Code

Here is the pseudocode for the two workers (full code here)

function render() {
ctx.clearRect(canvas.width, canvas.height);
sourceCtx.fillText(`Frame ${frameNumber}`, 20, height / 2);
const videoFrame = new VideoFrame(sourceCanvas, {
timestamp: frameNumber * (1e6 / frameRate)
});
self.postMessage(videoFrame, [videoFrame]);
frameNumber++;
requestAnimationFrame(render)
}

Here’s a quick animation to visualize the data flow:

When sending raw uncompressed 320x240 video, we are sending about 9000 kilobytes per second or 72 Megabits / second, which is around the same bitrate you’d expect for studio-quality 4K video used by professional editors, and about as fast as real-world fiber-optic connections can realistically handle.

Let’s take the same exact example, but now we encode the video chunks before sending it between workers.

Source Code

Here is the pseudo code for the two workers (full code here):

const encoder = new VideoEncoder({
output: (chunk, metadata) => {
self.postMessage(chunk)
},
error: (e) => console.error('Encoder error:', e)
});
function render() {
ctx.clearRect(canvas.width, canvas.height);
sourceCtx.fillText(`Frame ${frameNumber}`, 20, height / 2);
const videoFrame = new VideoFrame(sourceCanvas, {
timestamp: frameNumber * (1e6 / frameRate)
});
encoder.encode(videoFrame, { keyFrame: frameNumber % 60 === 0 });
videoFrame.close();
frameNumber++;
requestAnimationFrame(render)
}

Here’s what the deta flow looks like when adding in the encoder/decoder.

As you can see, encoding the video reduces the bandwidth by 100x (9000 kB/s vs 9 kB/s).

In the real world, if you are actually streaming 4K video, the raw stream would be ~7 Gigabits per second (no home internet connection would be able to keep up), while an encoded stream would be around 10 Megabits per second, which is again, ~100x smaller, and something that many home internet connections would handle without issue.

We won’t get into how these compression algorithms actually work (see [here] if want to learn more), but a core feature of all the major compression algorithms (codecs) supported by web browsers is that they don’t encode each video frame independently, but instead encode the differences between frames.

Consider again one of the simplest possible videos:


drawing

If you look at any two consecutive frames, these frames are pretty similar, with most of the pixels actually being identical.

You might be able to imagine how, with some clever engineering, you could formulate a way to calculate just the difference between these frames (e.g. what changes from frame 1 to frame 2).

That way you don’t actually need to even store frame 2, you just need to store the first frame, and then store the difference between frame 1 and frame 2 to be able to reconstruct frame 2.

To send a full video, you could send the first frame (called a key frame), and then just keep sending “frame differences” (called delta frames)

This is exacly what real-world codecs do, with delta frames typically being 2x to 10x smaller than key frames [do the study, make the citation)].

The EncodedVideoChunk represents this property via the type attribute, with key frames having a key type, and delta frames having a delta type

import { getVideoChunks } from 'webcodecs-utils'
const chunks = <EncodedVideoChunk[]> await getVideoChunks(<File> file);
console.log(chunks[0].type); //"key"
console.log(chunks[1].type); //"delta"
console.log(chunks[2].type); //"delta"

Typically, videos will add in a fresh key frame every 30 to 60 frames, though this is something you can control in the VideoEncoder.

An important consequence is that you can’t just decode a delta frame by itself. To reconstruct a delta frame, you need to decode every single frame, in order, from the previous key frame up until the delta frame you want to decode, and we’ll cover this later in playback design

So while there is a 1:1 correspondence between each EncodedVideoChunk and each VideoFrame in a file in WebCodecs, where decoding 100 EncodedVideoChunk objects will result in 100 VideoFrame objects, and encoding 100 VideoFrame objects will result in 100 EncodedVidoChunk objects, you can’t deal with an EncodedVideoChunk in isolation in the way you can with a VideoFrame.

You need to work with EncodedVideoChunk objects as a sequence, and when decoding, you need to decode them in the exact correct order.

You can get an EncodedVideoChunk by either encoding a video or demuxing a source file.

VideoEncoder will naturally just give you EncodedVideoFrame objects read to use, you never have to construct them.

const encoder = new VideoEncoder({
output: function(chunk: EncodedVideoChunk){
//Do something
},
error: function(e){console.log(e)}
});
encoder.configure(/*config*/)
for await (const frame of getFrame()){ // however you get frames
encoder.encode(frame);
frame.close();
}

Demuxing libraries will also give you formatted EncodedVideoChunk objects. Here’s how to do it in MediaBunny.

import { EncodedPacketSink, Input, ALL_FORMATS, BlobSource } from 'mediabunny';
const input = new Input({
formats: ALL_FORMATS,
source: new BlobSource(<File> file),
});
const videoTrack = await input.getPrimaryVideoTrack();
const sink = new EncodedPacketSink(videoTrack);
for await (const packet of sink.packets()) {
const chunk = <EncodedVideoChunk> packet.toEncodedVideoChunk();
}

There are other demuxing libraries, we’ll go into more detail in the next section

If you know what you are doing, you can manually create valid EncodedVideoChunk objects by hand via a the new EncodedvideoChunk() constructor. Manual construction might look something like this:

const [sampleDataOffset, sampleDataLength] = calculateSampleDataOffset(0); //First sample
const sampleData = file.slice(sampleDataOffset, sampleDataLength);
const isKeyFrameFlag = <Boolean> getIsKeyFrame(sampleData);
const timeStamp = <Number> getTimeStamp(sampleData);
const frameData = <Uint8Array> getFrameData(sampleData);
const duration = <Number> getDuration(sampleData);
const chunk = new EncodedVideoChunk({
type: isKeyframe ? "key" : "delta",
timestamp: timeStamp * 1e3,
data: frameData,
duration: duration * 1e3
});

Where for a file you’d typically have to build a parsing function for each container (WebM, MP4) to get this info. We’ll cover manual parsing (and why you probably shouldn’t do it) in the next section.

Alternatively, if you are streaming video, you and know you are working with WebCodecs, and controlling the source and destination, you don’t need fancy muxing or demuxing, you can just pass build your own custom markup / schema to keep track the meta data (type, timestamp, duration) and data (Uint8Array) associated with each EncodedVideoChunk).

You can use EncodedVideoChunk objects by decoding them, muxing them, or manually processing them.

import { demuxVideo } from 'webcodecs-utils'
const {chunks, config} = await demuxVideo(file);
const decoder = new VideoDecoder({
output(frame: VideoFrame) {
//Do something with the frame
},
error(e) {}
});
decoder.configure(config);
for (const chunk of chunks){
deocder.decode(chunks)
}

You can also use chunks to mux to a file, and each muxing library has their own API.

// Use mediabunny for production, these are just simplified utils for learning
import { getVideoChunks, ExampleMuxer } from 'webcodecs-utils'
const chunks = <EncodedVideoChunk[]> await getVideoChunks(file);
const muxer = new ExampleMuxer();
for (const chunk of chunks){
muxer.addChunk(chunk);
}
const arrayBuffer = await muxer.finish();
const blob = new Blob([arrayBuffer], {type: 'video/mp4'})

Again, we’ll cover muxing in the next section.

Finally, for more control you can manually extract the data from an EncodedVideoChunk and send them somewhere else (like over a network, for streaming)

const destinationBuffer = new Uint8Array(chunk.byteLength);
chunk.copyTo(destinationBuffer);
sendSomewhere({
data: <Uint8Array> destinationBuffer,
type: chunk.type,
duration: chunk.duration, //don't forget this is in microseconds
timestamp: chunk.timestamp //also in microseconds
})

You could theoretically stream this data (the actual buffer and meta) over a network to a browser instance and reconstruct the EncodedVideoChunk using the manual construction method.