Intro to Audio
Up until now we’ve been exclusively focusing on Video because, well, video is hard enough on it’s own, without the additional challenge of also managing audio, let alone handling audio-video sync.
I also saved it for later because, and I can’t emphasize this enough:
You may not need WebCodecs for Audio
Section titled “You may not need WebCodecs for Audio”Let me explain
What WebCodecs Audio does
Section titled “What WebCodecs Audio does”WebCodecs has the AudioEncoder and AudioDecoder which let you encode raw audio into encoded audio, and decode encoded audio into raw audio.
That may seem obvious, but here is a key limitation:
WebCodecs only supports AAC audio for MP4 files, and Opus audio for WebM file, which are the most typical audio codecs used with those types of video files, but it won’t handle MP3, or other Audio Formats.
If you’re only working with MP4 or WebM video files, this is fine. If you want to export standalone audio, you’ll need seperate libraries to handle MP3 encoding (covered here)
If you’re only working with audio, you might be better off with the WebAudio API.
Web Audio API
Section titled “Web Audio API”The WebAudio API is a completely different API for decoding and playing audio in the browser, as well as applying custom audio processing and filters.
Unhelpfully, while there is some overlap in these APIs (both can be used to decode audio), they also don’t really talk to each other. For example, WebCodecs represents raw audio via the AudioData object, but you can’t play back AudioData in the browser. You need WebAudio to play audio in the browser, and WebAudio uses AudioBuffer, a completely different class for representing raw audio.
You can convert AudioData to AudioBuffer with some hacky Float32Array gymnastics, but it takes cpu effort to do so and you can’t do this in a worker because WebAudio is only available on the main thread.
You’re better off just ignoring WebCodecs, and just using WebAudio for playback, which we’ll cover in the WebAudio section
When to use which
Section titled “When to use which”Transcoding
Section titled “Transcoding”If you are just transcoding a video (or applying a video filter), you may not even need to decode and re-encode the audio. You can literally pass source EncodedAudioChunk objects from a demuxer straight into the muxer for the video file you want to write.
// This is using an a demo muxer & demuxer, for prod use a library like MediaBunnyimport {getAudioChunks, ExampleMuxer} from 'webcodecs-utils'
async function transcodeFile(file: File){
const audio_chunks = <EncodedAudioChunk[]> await getAudioChunks(file); const muxer = new ExampleMuxer('audio');
for (const chunk of audio_chunks){ muxer.addChunk(chunk); // That's it! }}This is what I do with my free upscaling tool, see the source code here.
We’ll cover this pattern in more detail here
Playback
Section titled “Playback”If you’re building a video player, or building a video editor where you play the current composition, you likely wouldn’t touch WebCodecs for the audio, it’d be much better to use WebAudio which will talk about here.
We’ll playback in more detail here
Audio Only
Section titled “Audio Only”If you want to, say, do audio editing or audio transcoding, where you read in, process and export audio files as MP3, AudioEncoder and AudioDecoder won’t help here. You’d need to use 3rd party libraries to handle those files (more on that here)
Audio + Video
Section titled “Audio + Video”If you’re building transcoding software to handle video inputs and standalone audio inputs, and/or your application works outputs video as well as standalone audio outputs, you’ll likely need to use both WebCodecs and 3rd party libraries to handle MP3 encoding/decoding.
Here, audio only is not a subset / simpler case vs video+audio, instead audio-only imports/exports require additional pipelines and complexity.
MultiMedia Editing
Section titled “MultiMedia Editing”If you’re building sofware enabling users to input audio-only and video sources, providing real-time playback/preview of the composition, and enabling exporting to video and audio-only exports, then you’ll need to combine a number of things together.
- WebCodecs for real-time playback of video
- WebAudio for real-time playback of audio
- WebCodecs for video exports
- 3rd party libraries for audio-only exports
We’ll provide more detail on editing here
Choose your own adventure
Section titled “Choose your own adventure”Because the solutions for audio are different based on use case, I wanted to provide this section up front as not all the following sections may be necessary. Consider the audio section of this guide a “Choose your own adventure”.
- You can skip this entire section if you use MediaBunny, though the docs may still be helpful to understand fundamentals
- If you don’t need to re-encode audio at all (e.g. video transcoding), feel free to skip the section entirely
- If you only care about playback and aren’t encoding audio, feel free to skip straight to playback
- If you only will be working with audio, feel free to skip straight to this section
Otherwise, let’s continue and in the next section I’ll actually start talking about WebCodecs audio.