<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="rfc7991bis.rnc"?>

<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp    "&#8203;">
  <!ENTITY nbhy    "&#8209;">
  <!ENTITY wj      "&#8288;">
  <!ENTITY docname "draft-swhited-ogg-stems-00">
]>
<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  category="info"
  docName="&docname;"
  ipr="trust200902"
  obsoletes=""
  updates=""
  submissionType="IETF"
  xml:lang="en"
  version="3">

  <front>
    <title abbrev="Abbreviated Title">OGG Stem Files</title>
    <seriesInfo name="Internet-Draft" value="&docname;"/>
    <author fullname="Sam Whited" initials="ssw" role="editor" surname="Whited">
      <address>
        <email>sam@samwhited.com</email>
        <uri>https://blog.samwhited.com</uri>
      </address>
    </author>

    <date year="2026" month="2" day="17"/>
    <area>General</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <keyword>audio</keyword>
    <keyword>ogg</keyword>
    <keyword>stems</keyword>
    <keyword>djing</keyword>

    <abstract>
      <t>
        This document defines a multi-track profile of the OGG container format
        for storing stems that is also backwards compatible with existing media
        players.
      </t>

    </abstract>
  </front>

  <middle>
    <section>
      <name>Introduction</name>
        <t>
          Stem are recordings of individual instruments, or clusters of
          instruments, used by DJs and music producers for live mixing of music.
          Historically stem files have been stored as individual audio files, or
          using patent-encumbered or vendor specific proprietary container
          formats.
          The OGG file format developed by the Xiph.Org Foundation was formally
          specified in <xref target="RFC3533"/> and <xref target="RFC5334"/> and
          is ideally situated as a container for stems.
          This specification documents a profile for the Ogg container format
          that allows it to store lossless or lossy stems as well as metadata
          about the stems for use in DJ applications or Digital Audio
          Workstations.
        </t>

      <section anchor="requirements">
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
          "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT
          RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
          interpreted as described in BCP 14 <xref target="RFC2119"/>
          <xref target="RFC8174"/> when, and only when, they appear in
          all capitals, as shown here.</t>
      </section>
    </section>

    <section>
      <name>Requirements</name>
      <t>
        STEM files have a few basic requirements:
      </t>
      <ul spacing="normal">
        <li>Backwards compatibility with existing media players</li>
        <li>The ability to store at least 5 stereo audio tracks</li>
        <li>The ability to syncronize multiple audio tracks</li>
        <li>The ability to store global metadata and per-stem metadata</li>
      </ul>
    </section>

    <section>
      <name>Bitstream Layout</name>

      <section>
        <name>Audio Streams</name>

        <aside>
          <t>
            TK: if we use Skeleton can we include syncronization data so that
            the stems don't have to have the same length? Or will this just make
            things harder to decode with no real benefit (since FLAC or Opus
            would compress the silence)?
          </t>
        </aside>

        <t>
          Each stem file may contain an arbitrary number of logical
          bitstreams containing audio and MUST include at least 3 streams (the
          original audio and at least two stems).
          Each stream MUST be encoded using the same codec with the same
          parameters including bitrate, channel number, channel layout, and
          sample rate.
        </t>
        <t>
          The first logical bitstream MUST be the final post-mix, mastered
          audio.
          This helps preserve backwards compatibility in media players which do
          not support a <xref target="Skeleton"/> bitstream.
          The remaining logical bitstreams will be the stems and MUST have the
          same audio length as the first logical bitstream.
          For example, if the original logical bitstream is 3 minutes long and
          the stem file includes a percussion track but the percussion does not
          start until minute 2 the percussion stem would still be 3 minutes long
          but would contain a minute of silence at the start of the track.
        </t>
      </section>

      <section>
        <name>Skeleton</name>

        <aside>
          <t>
            TK: Skeleton seems ideal for the stems use case, but I can't figure
            out if it's still recommended by Xiph.Org or which version we should
            use (the Xiph.Org website has a page for v3, but the wiki has a v4
            that it says it the latest).
            Maybe it would be better to define our own stream/metadata type and
            keep everything there?
            If we're just using Skeleton for per-strem metadata it might be
            overkill anyways since we'll have to define some sort of global
            metadata logical bitstream anyways to store the DSP info.
          </t>
        </aside>

        <t>
          Stem files MUST contain a <xref target="Skeleton"/> bitstream.
          For each fisbone secondary header packet describing a stem logical
          bitstream (ie. not the fisbone packet describing the first stream
          containing the post-mix audio) the following message headers are
          defined:
        </t>

        <table>
          <thead>
            <tr><th>Message Header</th><th>Requirement Level</th><th>Description</th></tr>
          </thead>
          <tbody>
            <tr><td>Role</td><td>REQUIRED</td><td>MUST always be "audio/stem"</td></tr>
            <tr><td>Title</td><td>REQUIRED</td><td>Free text, used for the stem name (eg. "Percussion")</td></tr>
            <tr><td>Stem-color</td><td>OPTIONAL</td><td>Color representing this track in RGB hex format, eg. "#145374"</td></tr>
          </tbody>
        </table>

        <t>
          The fisbone secondary header packet describing the first logical
          bitstream containing the main audio MUST set the "Role" message header
          to "audio/main".
        </t>
      </section>
    </section>

    <section>
      <name>Mixing</name>

      <t>
        The stem track SHOULD NOT have any gain normalization applied.
        Instead they should retain the same levels as they would have in the
        final mix present in the first track so that if all stems were played at
        unity gain the levels would be equivalent to the final mix.
      </t>
    </section>

    <section>
      <name>Mastering</name>

      <aside>
        <t>
          TK: does it make sense to put these in their own OGG page instead
          of just putting them in the vorbis comments with everything else?
          It would make them less likely to be stripped out by metadata editors.
          Maybe we define a different raw VorbisComment logical bitstream, or
          use a JSON blob or similar like the NI ones do?
        </t>
      </aside>

      <t>
        Because mastering happens post-mix and the stems are pre-mix audio the
        stem tracks SHOULD NOT have any mastering steps applied.
        Instead, metadata for configuring a compressor and limiter SHOULD be
        included in the stem file.
        After mixing the stems applications MAY choose to feed the mix through
        a Digital Signal Processor configured with the limiter and compressor
        settings read from the metadata.
      </t>

      <section>
      <name>Compressor Metadata</name>

      <aside>
        <t>
          TK: I'm not really sure how this works for the NI stems, presumably
          they have a value range, but that probably depends on the specific
          compressor used and that's not likely something we can do in a
          standard format.
          Instead we'd have to define exactly how the DSP works and say that you
          might need to normalize values for specific DSP's? Unclear how best to
          handle this.
        </t>
      </aside>

      <t>
        Metadata used for configuring the compressor should be stored alongside
        the stem files global metadata (ie. in the primary VorbisComment).
      </t>

      <table>
        <thead>
          <tr><th>Tag</th><th>Requirement Level</th><th>Values</th></tr>
        </thead>
        <tbody>
          <tr><td>STEM:COMPRESSOR:ENABLED</td><td>REQUIRED</td><td>"TRUE" or "FALSE"</td></tr>
          <tr><td>STEM:COMPRESSOR:RATIO</td><td>OPTIONAL</td><td>TODO</td></tr>
          <tr><td>STEM:COMPRESSOR:OUTPUT_GAIN</td><td>OPTIONAL</td><td>TODO</td></tr>
          <tr><td>STEM:COMPRESSOR:THRESHOLD</td><td>OPTIONAL</td><td>TODO</td></tr>
          <tr><td>STEM:COMPRESSOR:ATTACK</td><td>OPTIONAL</td><td>TODO</td></tr>
          <tr><td>STEM:COMPRESSOR:INPUT_GAIN</td><td>OPTIONAL</td><td>TODO</td></tr>
          <tr><td>STEM:COMPRESSOR:RELEASE</td><td>OPTIONAL</td><td>TODO</td></tr>
          <tr><td>STEM:COMPRESSOR:HP_CUTOFF</td><td>OPTIONAL</td><td>TODO</td></tr>
          <tr><td>STEM:COMPRESSOR:HP_DRY_WET</td><td>OPTIONAL</td><td>TODO</td></tr>
        </tbody>
      </table>
      </section>

      <section>
      <name>Limiter Metadata</name>

      <t>
        Metadata used for configuring the limiter should be stored alongside
        the stem files global metadata (ie. in the primary VorbisComment).
      </t>

      <table>
        <thead>
          <tr><th>Tag</th><th>Requirement Level</th><th>Values</th></tr>
        </thead>
        <tbody>
          <tr><td>STEM:LIMITER:ENABLED</td><td>REQUIRED</td><td>"TRUE" or "FALSE"</td></tr>
          <tr><td>STEM:LIMITER:RELEASE</td><td>OPTIONAL</td><td>TODO</td></tr>
          <tr><td>STEM:LIMITER:THRESHOLD</td><td>OPTIONAL</td><td>TODO</td></tr>
          <tr><td>STEM:LIMITER:CEILING</td><td>OPTIONAL</td><td>TODO</td></tr>
        </tbody>
      </table>
      </section>
    </section>

    <section anchor="IANA">
      <name>IANA Considerations</name>
      <t>This memo includes no request to IANA.</t>
    </section>

    <section anchor="Security">
      <name>Security Considerations</name>
      <t>This document should not affect the security of the Internet.</t>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3533.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5334.xml"/>
        <reference anchor="Skeleton" target="https://wiki.xiph.org/Ogg_Skeleton_4">
          <front>
            <title>OGG Skeleton 4</title>
            <author>
              <organization>Xiph.Org Foundation</organization>
            </author>
            <date year="2026" month="02" day="18" />
          </front>
        </reference>
      </references>
      <references>
        <name>Informative References</name>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
      </references>
    </references>

 </back>
</rfc>
