<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="rfc7991bis.rnc"?>

<!DOCTYPE rfc [
  <!ENTITY docname "draft-swhited-ogg-stems-01">
]>
<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  category="info"
  docName="&docname;"
  ipr="trust200902"
  obsoletes=""
  updates=""
  submissionType="IETF"
  xml:lang="en"
  version="3">

  <front>
    <title abbrev="Abbreviated Title">Ogg Stem Files</title>
    <seriesInfo name="Internet-Draft" value="&docname;"/>
    <author fullname="Sam Whited" initials="ssw" role="editor" surname="Whited">
      <address>
        <email>sam@samwhited.com</email>
        <uri>https://blog.samwhited.com</uri>
      </address>
    </author>

    <date year="2026" month="2" day="20"/>
    <area>General</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <keyword>audio</keyword>
    <keyword>ogg</keyword>
    <keyword>stems</keyword>
    <keyword>djing</keyword>

    <abstract>
      <t>
        This document defines a multi-track profile of the Ogg container format
        for storing stems that is also backwards compatible with existing media
        players.
      </t>

    </abstract>
  </front>

  <middle>
    <section>
      <name>Introduction</name>
        <t>
          Stem are recordings of individual instruments, or clusters of
          instruments, used by DJs and music producers for live mixing of music.
          Historically stem files have been stored as individual audio files, or
          using patent-encumbered or vendor specific proprietary container
          formats.
          The Ogg file format developed by the Xiph.Org Foundation was formally
          specified in <xref target="RFC3533"/> and <xref target="RFC5334"/> and
          is ideally situated as a container for stems.
          This specification documents a profile for the Ogg container format
          that allows it to store lossless or lossy stems as well as metadata
          about the stems for use in DJ applications or Digital Audio
          Workstations.
        </t>

      <section anchor="requirements">
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
          "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT
          RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
          interpreted as described in BCP 14 <xref target="RFC2119"/>
          <xref target="RFC8174"/> when, and only when, they appear in
          all capitals, as shown here.</t>
      </section>
    </section>

    <section>
      <name>Requirements</name>
      <t>
        STEM files have a few basic requirements:
      </t>
      <ul spacing="normal">
        <li>Backwards compatibility with existing media players</li>
        <li>The ability to store at 5 audio tracks</li>
        <li>The ability to syncronize playback of multiple audio tracks</li>
        <li>The ability to store file-level metadata and per-stem metadata</li>
      </ul>
    </section>

    <section>
      <name>Bitstream Layout</name>

      <section>
        <name>Audio Streams</name>
        <t>
          Each stem file may contain an arbitrary number of logical
          bitstreams containing audio and MUST include at least 3 streams (the
          original audio and at least two stems).
          Each stream MUST be encoded using the same codec with the same
          parameters including bitrate, channel number, channel layout, and
          sample rate.
        </t>
        <t>
          The first logical bitstream containing audio data MUST be the final
          post-mix, mastered audio.
          This helps preserve backwards compatibility in media players which do
          not support this format (which typically play the first audio stream
          found).
          The remaining logical bitstreams will be individual stems and MUST
          have the same audio length as the first logical bitstream such that
          playing each stem stream from the beginning would result in the same
          audio (excluding mastering) as the final mix present in the first
          logical bitstream.
        </t>
        <t>
          For example, if the original logical bitstream is 3 minutes long and
          the stem file includes a percussion track but the percussion does not
          start until minute 2 the percussion stem would still be 3 minutes long
          but would contain a minute of silence at the start of the track.
        </t>
      </section>

      <section>
        <name>Stem Metadata</name>

        <t>
          The following tags MUST be stored in the Vorbis comment block
          encapsulated in the individual FLAC or Opus audio stream representing
          each stem.
          Keys for these tags rae case insensitive.
        </t>

        <table>
          <thead>
            <tr><th>Tag</th><th>Description</th><th>Example</th></tr>
          </thead>
          <tbody>
            <tr><td>STEM:TITLE</td><td>Free text, used for the stem name</td><td>Percussion</td></tr>
            <tr><td>STEM:COLOR</td><td>Color representing this track in RGB hex format</td><td>#145374</td></tr>
          </tbody>
        </table>
      </section>

      <section>
        <name>DSP Metadata</name>

        <t>
          For metadata that applies to all the stems it is not desirable to
          include it in the individual stream metadata blocks for several
          reasons:
        </t>

        <ol spacing="normal">
          <li>
            In the absence of a standard many applications only store
            information on the first stream, but in the case of stems this is
            the one stream to which none of this metadata applies
          </li>
          <li>
            Applications meant for writing general metadata may remove unknown
            values in the first streams metadata
          </li>
          <li>
            Some stem metadata should be associated with all stem streams, but
            not the main mix stream and storing it on every stream is not ideal
          </li>
        </ol>

        <t>
          To work around these limitations stem files store metadata that
          applies to all stems (notably information about configuring a basic
          Digital Signal Processor or DSP) in a separate logical
          bitstream, the first packet of which is structured according
          to the following table:
        </t>

        <table>
          <thead>
            <tr><th>Data</th><th>Description</th></tr>
          </thead>
          <tbody>
            <tr><td>8 bytes</td><td>0x53 0x74 0x65 0x6d 0x4d 0x65 0x74 0x61 ("StemMeta")</td></tr>
            <tr><td>2 bytes</td>
              <td>
                Version number of the metadata logical bitstream (notably this
                is not the version of the metadata stored in the mapping).
                These bytes are 0x01 0x00, meaning version 1.0 of the mapping.
              </td>
            </tr>
          </tbody>
        </table>

        <t>
          The remainder of the logical bitstream comprises a Vorbis comment
          metadata block containing human-readable information coded in
          UTF-8.
          The name "Vorbis comment" points to the fact that the Vorbis codec
          stores such metadata in almost the same way (see
          <xref target="Vorbis"/>).
          A stem file MUST NOT contain more than one Vorbis comment
          metadata block
          The Vorbis comment metadata block is defined to be identical to the
          Vorbis comment metadata block defined in <xref target="RFC9639"/>
          section 8.6, "Vorbis Comment".
        </t>

        <t>
          The Vorbis comment metadata block SHOULD NOT be used for arbitrary
          metadata that is unrelated to stems (ie. a track title or author).
          Vendor specific tags MAY be included in the metadata block.
          Vendor specific tags in the block MUST use a vendor specific
          namespace and MUST NOT prefix their tags with "STEM:".
          Specific keys for the Vorbis comment metadata block are defined in the
          "Mastering" section.
        </t>
      </section>
    </section>

    <section>
      <name>Mixing</name>

      <t>
        The stem tracks SHOULD NOT have any gain normalization applied.
        Instead they should retain the same levels as they would have in the
        final mix present in the first track so that if all stems were played at
        unity gain the levels would be equivalent to the final mix.
      </t>
    </section>

    <section>
      <name>Mastering</name>

      <t>
        Because mastering happens post-mix and the stems are pre-mix audio the
        stem tracks SHOULD NOT have any mastering steps applied.
        Instead, metadata for configuring a compressor and limiter SHOULD be
        included in the previously defined Vorbis comment metadata block.
        After mixing the stems applications MAY choose to feed the mix through
        a Digital Signal Processor configured with the limiter and compressor
        settings read from the metadata.
      </t>

      <section>
        <name>Compressor Metadata</name>

        <aside>
          <t>
            TK: I'm not really sure how this works for the NI stems, presumably
            they have a value range, but that probably depends on the specific
            compressor used and that's not likely something we can do in a
            standard format.
            Instead we'd have to define exactly how the DSP works and say that you
            might need to normalize values for specific DSP's?
            Unclear how best to handle this.
          </t>
        </aside>

        <table>
          <thead>
            <tr><th>Tag</th><th>Requirement Level</th><th>Values</th></tr>
          </thead>
          <tbody>
            <tr><td>STEM:COMPRESSOR:ENABLED</td><td>REQUIRED</td><td>"TRUE" or "FALSE"</td></tr>
            <tr><td>STEM:COMPRESSOR:RATIO</td><td>OPTIONAL</td><td>TODO</td></tr>
            <tr><td>STEM:COMPRESSOR:OUTPUT_GAIN</td><td>OPTIONAL</td><td>TODO</td></tr>
            <tr><td>STEM:COMPRESSOR:THRESHOLD</td><td>OPTIONAL</td><td>TODO</td></tr>
            <tr><td>STEM:COMPRESSOR:ATTACK</td><td>OPTIONAL</td><td>TODO</td></tr>
            <tr><td>STEM:COMPRESSOR:INPUT_GAIN</td><td>OPTIONAL</td><td>TODO</td></tr>
            <tr><td>STEM:COMPRESSOR:RELEASE</td><td>OPTIONAL</td><td>TODO</td></tr>
            <tr><td>STEM:COMPRESSOR:HP_CUTOFF</td><td>OPTIONAL</td><td>TODO</td></tr>
            <tr><td>STEM:COMPRESSOR:HP_DRY_WET</td><td>OPTIONAL</td><td>TODO</td></tr>
          </tbody>
        </table>
      </section>

      <section>
        <name>Limiter Metadata</name>

        <table>
          <thead>
            <tr><th>Tag</th><th>Requirement Level</th><th>Values</th></tr>
          </thead>
          <tbody>
            <tr><td>STEM:LIMITER:ENABLED</td><td>REQUIRED</td><td>"TRUE" or "FALSE"</td></tr>
            <tr><td>STEM:LIMITER:RELEASE</td><td>OPTIONAL</td><td>TODO</td></tr>
            <tr><td>STEM:LIMITER:THRESHOLD</td><td>OPTIONAL</td><td>TODO</td></tr>
            <tr><td>STEM:LIMITER:CEILING</td><td>OPTIONAL</td><td>TODO</td></tr>
          </tbody>
        </table>
      </section>
    </section>

    <section>
      <name>Use with Ogg Skeleton</name>

      <t>
        Ogg <xref target="Skeleton"/> is a format designed to provide
        structuring information for multi-track Ogg files.
        Its use is not defined for stem files, however, if a Skeleton logical
        bitstream is present each fisbone secondary header packet describing a
        logical bitstream containing a stem track SHOULD set the "role" header
        to the value "audio/stem".
        Similarly, the fisbone secondary header packet describing the first
        logical bitstream containing the main audio SHOULD set the "Role"
        message header to "audio/main".
      </t>
    </section>

    <section anchor="IANA">
      <name>IANA Considerations</name>
      <t>This memo includes no request to IANA.</t>
    </section>

    <section anchor="Security">
      <name>Security Considerations</name>
      <t>This document should not affect the security of the Internet.</t>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative</name>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3533.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5334.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9639.xml"/>
      </references>
      <references>
        <name>Informative</name>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <reference anchor="Vorbis" target="https://xiph.org/vorbis/doc/Vorbis_I_spec.html">
          <front>
            <title>Vorbis I specification</title>
            <author>
              <organization>Xiph.Org Foundation</organization>
            </author>
            <date year="2020" month="07" day="04" />
          </front>
        </reference>
        <reference anchor="Skeleton" target="https://wiki.xiph.org/Ogg_Skeleton_4">
          <front>
            <title>Ogg Skeleton 4</title>
            <author>
              <organization>Xiph.Org Foundation</organization>
            </author>
            <date year="2016" month="05" day="23" />
          </front>
        </reference>
      </references>
    </references>

 </back>
</rfc>
