<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="rfc7991bis.rnc"?>
<!DOCTYPE rfc [
<!ENTITY docname "draft-swhited-mka-stems-02">
]>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="info" docName="&docname;" ipr="trust200902" obsoletes="" updates="" submissionType="IETF" xml:lang="en" version="3">
  <front>
    <title abbrev="MKA Stem">Matroska Stem Files</title>
    <seriesInfo name="Internet-Draft" value="&docname;"/>
    <author fullname="Sam Whited" initials="ssw" role="editor" surname="Whited">
      <address>
        <email>sam@samwhited.com</email>
        <uri>https://blog.samwhited.com</uri>
      </address>
    </author>
    <date year="2026" month="2" day="27"/>
    <area>General</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <keyword>audio</keyword>
    <keyword>ogg</keyword>
    <keyword>stems</keyword>
    <keyword>djing</keyword>
    <abstract>
      <t>
        This document defines a multi-track profile of the Matroska container
        format for storing stems that is also backwards compatible with
        existing media players.
      </t>
    </abstract>
  </front>
  <middle>
    <section>
      <name>Introduction</name>
      <t>
          Stem are recordings of individual instruments, or clusters of
          instruments, used by DJs and music producers for live mixing of music.
          Historically stem files have been stored as individual audio files, or
          using patent-encumbered or vendor specific proprietary container
          formats.
          The Matroska container format formally specified in
          <xref target="RFC9559"/> is ideally situated as a container for stems.

          This specification documents a profile for the Matroska container
          format that allows it to store lossless or lossy stems as well as
          metadata about the stems for use in DJ applications or Digital
          Audio Workstations.
        </t>
      <section anchor="requirements">
        <name>Requirements Language</name>
        <t>
          The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
          "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>",
          "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>",
          "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>",
          "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and
          "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
          described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/>
          when, and only when, they appear in all capitals, as shown here.
        </t>
      </section>
    </section>
    <section>
      <name>Requirements</name>
      <t>
        STEM files have a few basic requirements:
      </t>
      <ul spacing="normal">
        <li>Backwards compatibility with existing media players</li>
        <li>The ability to store at 5 audio tracks</li>
        <li>The ability to syncronize playback of multiple audio tracks</li>
        <li>The ability to store file-level metadata and per-stem metadata</li>
      </ul>
    </section>
    <section>
      <name>Track Layout</name>
      <section>
        <name>Audio Streams</name>
        <t>
          Each stem file may contain an arbitrary number of tracks containing
          audio and <bcp14>MUST</bcp14> include at least 3 tracks (the mixed
          audio and at least two stems).

          Each track <bcp14>SHOULD</bcp14> be encoded using the same codec with
          the same parameters including bitrate, channel number, channel layout,
          and sample rate.
        </t>
        <t>
          The first track containing audio data <bcp14>MUST</bcp14> be the final
          post-mix audio in the default language.
          All tracks containing the final post-mix audio regardless of language
          <bcp14>MUST</bcp14> have the Matroska default flag set
          (<xref target="RFC9559" sectionFormat="comma" section="18.1"/>).
          This helps preserve backwards compatibility in media players which do
          not support this format which typically play the first audio stream
          found or may select based on the default flag.
        </t>
        <t>
          The remaining tracks will be individual stems and <bcp14>MUST</bcp14>
          have the same effective length as the first track such that playing
          each stem track from the beginning would result in the same audio
          (excluding mastering) as the final mix present in the first track.
          For example, if the original track is 3 minutes long and the
          stem file includes a percussion track but the percussion does not
          start until minute 2 the percussion stem would still be 3 minutes long
          but would contain a minute of silence at the start of the track, or
          would have a block timestamp
          (<xref target="RFC9559" sectionFormat="comma" section="10"/>)
          that starts it at 1 minute.
        </t>
        <t>
          Each stem track <bcp14>MUST NOT</bcp14> have the Matroska default flag
          set.
        </t>
        <t>
          The stem tracks <bcp14>SHOULD NOT</bcp14> have any gain normalization
          applied.
          Instead they should retain the same levels as they would have in the
          final mix present in the first track so that if all stems were played
          at unity gain the levels would be equivalent to the final mix.
        </t>
        <t>
          Each stem track (ie. all tracks that are not the first track)
          <bcp14>MUST</bcp14> set the value of the
          <tt>\Segment\Tracks\TrackEntry\Name</tt> field to a
          human-readable track name for the stem, for example "Percussion" or
          "Vocals".
        </t>
        <t>
          For each stem track a <tt>\Segment\Tags\Tag</tt> must also be set with
          its target set to the stem track.
          The tag must contain a <tt>SimpleTag</tt> element with the
          <tt>TagName</tt> field set to "StemColor" and the <tt>TagString</tt>
          field set to a color representing the track in RGB hex format
          (ie. "#145374").
        </t>
      </section>
    </section>
    <section>
      <name>Digital Signal Processor</name>
      <t>
        Because mastering happens post-mix and the stems are pre-mix audio the
        stem tracks <bcp14>SHOULD NOT</bcp14> have any mastering steps applied.
        Instead, metadata for configuring a compressor and limiter
        <bcp14>SHOULD</bcp14> be included in the file's global metadata as
        simple tags.
        After mixing, playback applications <bcp14>MAY</bcp14> choose to feed
        the mix through a Digital Signal Processor configured with the limiter
        and compressor settings read from the metadata.
      </t>
      <section>
        <name>Compressor Metadata</name>
        <table>
          <thead>
            <tr>
              <th>Tag</th>
              <th>Requirement Level</th>
              <th>Values</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>STEM:COMPRESSOR:ENABLED</td>
              <td>
                <bcp14>REQUIRED</bcp14>
              </td>
              <td>"TRUE" or "FALSE"</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:RATIO</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:OUTPUT_GAIN</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:THRESHOLD</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:ATTACK</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:INPUT_GAIN</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:RELEASE</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:HP_CUTOFF</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:HP_DRY_WET</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
          </tbody>
        </table>
      </section>
      <section>
        <name>Limiter Metadata</name>
        <table>
          <thead>
            <tr>
              <th>Tag</th>
              <th>Requirement Level</th>
              <th>Values</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>STEM:LIMITER:ENABLED</td>
              <td>
                <bcp14>REQUIRED</bcp14>
              </td>
              <td>"TRUE" or "FALSE"</td>
            </tr>
            <tr>
              <td>STEM:LIMITER:RELEASE</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
            <tr>
              <td>STEM:LIMITER:THRESHOLD</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
            <tr>
              <td>STEM:LIMITER:CEILING</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>TODO</td>
            </tr>
          </tbody>
        </table>
      </section>
    </section>
    <section anchor="IANA">
      <name>IANA Considerations</name>
      <t>This memo includes no request to IANA.</t>
    </section>
    <section anchor="Security">
      <name>Security Considerations</name>
      <t>This document should not affect the security of the Internet.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>Normative References</name>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9559.xml"/>
    </references>
    <references>
      <name>Informative References</name>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
    </references>
    <section anchor="Acknowledgements" numbered="false">
      <name>Acknowledgements</name>
      <t>
        Thanks to the members of <tt>#matroska</tt> on the <tt>libera.chat</tt>
        IRC network for patiently explaining the basics of the format to me.
      </t>
    </section>
  </back>
</rfc>
