MPEG-H and Av1 is this the mux for the next generation?

I think so, I have written Fraunhofer about it and I am determined to create an MPEG-H audio file, muxed with an AV1 video file in 8K and using Matroska (MKV) as the container. I am hoping that Topaz Labs (or FFmpeg on the command line) will help me do this.

Why is this important?

MPEG-H is currently owned by Fraunhofer (just like Mp3) but there is no charge for creators to use it. AV1 and MKV are open source. So not only are these three objects free to use, they are cutting edge in technology. Sony is running with MPEG-H in its real audio format and of course AV1 and MKV are supported by AOMedia Alliance as open source and (theoretically) they have a responsibility to protect the smallest amongst us (me) against the companies that try to invent a patent pool.

A company that combines, licenses, and sells rights to technologies like AV1 or MPEG-H is often referred to as a patent pool organization or a licensing consortium. These entities manage and pool intellectual property rights from multiple contributors, making it easier for third parties to license the technology collectively.

Examples include:

MPEG LA: Known for managing licenses for video compression standards like MPEG-2 and H.264.
HEVC Advance: Focused on HEVC (High-Efficiency Video Coding) licensing.
AOMedia: While not a traditional licensing entity, its AV1 codec is open and royalty-free, providing a collaborative approach to technology distribution.

These organizations simplify the complex process of acquiring rights for technologies that involve numerous patents from various stakeholders.

The working group on AV1 is part of the Alliance for Open Media (AOMedia), which is a consortium of organizations dedicated to developing open, royalty-free multimedia technologies. The founding members of AOMedia include major tech companies such as:

Amazon
Cisco
Google
Intel
Microsoft
Mozilla
Netflix

Over time, the alliance has expanded to include other prominent members like Apple, ARM, Meta Platforms, Nvidia, Samsung Electronics, Tencent, and Huawei.

The Fraunhofer MPEG-H Authoring Tool (MHAT) is a powerful software designed to create immersive and interactive audio experiences. Here’s a deeper dive into its advanced features and capabilities:

Interactive Audio Authoring: The tool allows you to create object- or channel-based audio productions. You can position audio objects dynamically in a 3D space, enabling immersive soundscapes.
Personalization Features: MPEG-H Audio supports adjustable dialogue levels, multiple language tracks, and customizable audio descriptions. These features let users tailor their listening experience to their preferences.
Metadata Integration: The tool enables the addition of MPEG-H metadata to existing audio material. This metadata is crucial for enabling interactivity and immersive features.
Real-Time Monitoring: You can instantly preview your configurations, ensuring that the audio mix meets your creative vision before finalizing it.
Export Options: The tool supports exporting in various formats, including MPEG-H Production Format (MPF) and MPEG-H BWF/ADM. These formats are ready for distribution across MPEG-H-enabled platforms.
Compatibility with DAWs: While the MHAT can function independently, it also integrates seamlessly with Digital Audio Workstations (DAWs) like Nuendo and Pro Tools through the MPEG-H Authoring Plug-in (MHAPi).
Quality Control Tools: The MPEG-H Production Format Player (MPF Player) allows you to verify audio-visual sync, check render layouts, and review scene authoring before encoding.
Conversion and Encoding: The MPEG-H Conversion Tool (MCO) and MPEG-H Encoding and Muxing Tool (MHEX) help convert and encode audio data into various formats for distribution.

Let’s dive deeper into the process of combining MPEG-H Audio with AV1 Video and the codecs, tools, and workflows involved.

1. Understanding the Workflow

Combining audio and video streams requires three main steps:

Encoding the AV1 video and MPEG-H audio.
Multiplexing (muxing) them into a compatible container.
Ensuring the container maintains synchronization and metadata for both formats.

2. Codec Support

AV1 Codec: Open, royalty-free video codec designed for high efficiency, supported by platforms like YouTube, Netflix, and hardware vendors.
- Encoding tools: libaom-av1, SVT-AV1, or rav1e.
- AV1 is especially suited for streaming due to its exceptional compression efficiency.
MPEG-H Audio Codec: A cutting-edge codec for immersive and interactive 3D audio, capable of offering customizable features like variable dialogue levels and multiple languages.
- MPEG-H encoding tools: The Fraunhofer MPEG-H Authoring Suite can prepare MPEG-H audio streams.

3. Combining Streams: Tools

The process of merging AV1 video and MPEG-H audio typically happens in a container format like:

Matroska (MKV): Widely supported and flexible.
MP4: Streaming-friendly but slightly less flexible for advanced metadata.

Tools for Multiplexing:

FFmpeg: Open-source multimedia processing software ideal for combining video and audio.

Example: FFmpeg Command

Here’s how to use FFmpeg to mux AV1 and MPEG-H:

ffmpeg -i input_video.av1 -i input_audio.mpegh -c:v copy -c:a copy output.mkv

-c:v copy: Copies the AV1 video stream without re-encoding.
-c:a copy: Copies the MPEG-H audio stream without re-encoding.
output.mkv: Resultant file with AV1 video and MPEG-H audio.

4. Important Metadata for MPEG-H

MPEG-H audio requires specific metadata to enable features like immersive 3D audio and interactivity.
Tools like the MPEG-H Authoring Suite ensure the correct metadata is embedded before muxing.

5. Ensuring Compatibility

Playback: Make sure the playback software (e.g., VLC Media Player) or hardware supports both AV1 and MPEG-H.
Testing: Use tools like MediaInfo to verify the properties of the combined file.

6. Use Cases

Broadcast: Combining AV1 and MPEG-H is ideal for next-generation TV broadcasts with 4K video and immersive audio.
Streaming: Platforms like YouTube and Netflix are already exploring AV1 for video, and MPEG-H can add unparalleled audio quality.