<?xml version='1.0' encoding='utf-8'?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<?xml-model href="rfc7991bis.rnc"?>  <!-- Required for schema validation and schema-aware editing -->
<!-- <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?> -->
<!-- This third-party XSLT can be enabled for direct transformations in XML processors, including most browsers -->

<rfc
      xmlns:xi="http://www.w3.org/2001/XInclude"
      category="info"
      docName="draft-liu-rtgwg-llmsync-multicast-00"
      ipr="trust200902"
      obsoletes=""
      updates=""
      submissionType="IETF"
      xml:lang="en"
      tocInclude="true"
      tocDepth="4"
      symRefs="true"
      sortRefs="true"
      version="3">
  <!-- xml2rfc v2v3 conversion 2.38.1 -->
  <!-- category values: std, bcp, info, exp, and historic
    ipr values: trust200902, noModificationTrust200902, noDerivativesTrust200902,
       or pre5378Trust200902
    you can add the attributes updates="NNNN" and obsoletes="NNNN" 
    they will automatically be output with "(if approved)" -->

 <!-- ***** FRONT MATTER ***** -->

 <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
        full title is longer than 39 characters -->

   <title abbrev="Abbreviated Title">Multicast Use Cases for Large Language Model Synchronization</title>
    <seriesInfo name="Internet-Draft" value="draft-liu-rtgwg-llmsync-multicast-00"/>
    <!-- add 'role="editor"' below for the editors if appropriate -->

   <!-- Another author who claims to be an editor -->

   <author fullname="Yisong Liu" initials="Y" surname="Liu">
      <organization>China Mobile</organization>
      <address>
        <postal>
          <street/>
          <!-- Reorder these if your country does things differently -->

         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>liuyisong@chinamobile.com</email>
        <!-- uri and facsimile elements may also be added -->
     </address>
    </author>

   <author fullname="Zheng Zhang" initials="Z" surname="Zhang">
      <organization>ZTE Corporation</organization>
      <address>
        <postal>
          <street/>
          <!-- Reorder these if your country does things differently -->

         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>zhang.zheng@zte.com.cn</email>
        <!-- uri and facsimile elements may also be added -->
     </address>
    </author>
	
	<author fullname="Junye Zhang" initials="J" surname="Zhang">
      <organization>China Mobile</organization>
      <address>
        <postal>
          <street/>
          <!-- Reorder these if your country does things differently -->

         <city></city>
          <region/>
          <code/>
          <country>China</country>
        </postal>
        <phone></phone>
        <email>zhangjunye@chinamobile.com</email>
        <!-- uri and facsimile elements may also be added -->
     </address>
    </author>   
	
    <date year="2026"/>
    <!-- If the month and year are both specified and are the current ones, xml2rfc will fill 
        in the current day for you. If only the current year is specified, xml2rfc will fill 
     in the current day and month for you. If the year is not the current one, it is 
     necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the 
     purpose of calculating the expiry date).  With drafts it is normally sufficient to 
     specify just the year. -->

   <!-- Meta-data Declarations -->

   <area>Routing</area>
    <workgroup>RTGWG</workgroup>
    <!-- WG name at the upperleft corner of the doc,
        IETF is fine for individual submissions.  
     If this element is not present, the default is "Network Working Group",
        which is used by the RFC Editor as a nod to the history of the IETF. -->

   <keyword>LLM Synchronization</keyword>
    <!-- Keywords will be incorporated into HTML output
        files in a meta tag but they have no effect on text or nroff
        output. If you submit your draft to the RFC Editor, the
        keywords will be used for the search engine. -->

   <abstract>
      <t>Large Language Models (LLMs) deployments are becoming increasingly widespread, 
	  with inference services being the most common application. 
	  This draft will discuss multicast use cases for inference cloud services.</t>
    </abstract>
  </front>
  <middle>
    <section numbered="true" toc="default">
      <name>Introduction</name>
      <t>With the rapid development of AI and the widespread application of large language models (LLMs), 
	  inference services are the most frequently used services. 
	  Different users may use different LLMs, 
	  and the same user may use multiple LLMs simultaneously for inference to obtain the optimal solution. 
	  AI inference cloud providers can provide large-scale real-time inference, 
	  fine-tuning, and model optimization services on GPU cloud platforms. 
	  However, the GPU infrastructure of AI inference cloud providers may cover multiple cloud platforms and regions, 
	  facing significant challenges in deployment and application, 
	  including highly concurrent model loading and severe cold start latency.</t>

	   <t>This draft will discuss multicast use cases for inference cloud services.</t>

      <section numbered="true" toc="default">
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
       "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
       document are to be interpreted as described in <xref target="RFC2119" format="default"/>.</t>
      </section>
    </section>
    
    <section numbered="true" toc="default">
      <name>LLM Synchronization</name>
      <figure anchor="Fig0">
        <artwork align="left" name="Figure 0" type="" alt=""><![CDATA[
                        +------------------+
                        | Model Repository |
                        +--+-----+-----+---+
                           |     |     |
                           |     |     |
          +----------------+     |     +----------------+
          |                      |                      |
          |                      |                      |
+-------------------+  +-------------------+  +-------------------+
| +---------------+ |  | +---------------+ |  | +---------------+ |
| | Local Storage | |  | | Local Storage | |  | | Local Storage | |
| +---------------+ |  | +---------------+ |  | +---------------+ |
|                   |  |                   |  |                   |
| +---------------+ |  | +---------------+ |  | +---------------+ |
| |  GPU Cluster  | |  | |  GPU Cluster  | |  | |  GPU Cluster  | |
| +---------------+ |  | +---------------+ |  | +---------------+ |
|                   |  |                   |  |                   |
|    GPU Cloud 1    |  |    GPU Cloud 2    |  |    GPU Cloud n    |
+-------------------+  +-------------------+  +-------------------+
           ]]></artwork>
      </figure>
      
      <t>Highly concurrent model loading refers to the peak load and concurrency challenge 
	  of simultaneously downloading the same popular large model across dozens of GPU cloud platforms. 
	  A single deployment of a model may involve 10 to 100+ copies, each requiring a complete copy of the model file. 
	  Hundreds of GPU servers simultaneously downloading the same model (each ranging from 70GB to 1TB in size) within 
	  a single cluster generates enormous bandwidth demands and creates I/O bottlenecks.</t>
      
      <t>Severe cold start latency refers to the high delay in initial model deployment caused by slow download speeds. 
	  Replica startup time is limited by inbound network bandwidth, which varies significantly between different cloud providers 
	  and is typically much lower than the cluster's internal bandwidth. 
	  This significantly impacts the download efficiency of large models in practical applications.</t>
	  
	  <t>Highly concurrent model downloading are typical multicast applications. 
	  Such multicast applications have the following characteristics:</t> 
  
	  <ul spacing="normal">
        <li>Large data volume: Due to the large size of the models, typically a single model can reach 70GB to 1TB, 
	  placing extremely high demands on network bandwidth.</li>
        <li>Transmission time: Due to cold start latency requirements, data transmission needs to be completed as quickly as possible. 
	  Transmission times exceeding tens of minutes will significantly impact user experience; 
	  therefore, the shorter the time, the better.</li>
      </ul>
	  
	  <t>Therefore, when applying multicast technology to this scenario, 
	  it is necessary to consider ensuring high bandwidth and low latency.</t>
      
    </section>
    
    <section numbered="true" toc="default">
      <name>Multicast technologies applying</name>
      <t>Considering the need to conserve network bandwidth, ingress interface replication technology is not suitable for this scenario. 
	  PIM-SM, SR P2MP or BIER technologies should be considered instead.</t>
	  
	  <t>Protocol Independent Multicast - Sparse Mode (PIM-SM) <xref target="RFC7761" format="default"/> is 
	  a traditional multicast technology. 
      It is widely used in scenarios where the receivers are relatively fixed, such as IPTV systems. 
	  When the network topology of the multicast tree changes, 
	  a new multicast tree needs to be established for each multicast stream via PIM-SM signaling after the BGP/IGP protocol converges. 
	  The convergence time of the multicast tree is much longer than that of the IGP protocol.</t>
      
	  <t>SR-P2MP (Segment Routing Replication for Multipoint Service Delivery) <xref target="I-D.ietf-pim-sr-p2mp-policy"/>
	  is a relatively new tunneling technology that uses 
	  SR-MPLS/SRv6 (Segment Routing over IPv6) tunneling technology for multicast traffic transmission. 
	  It requires the routing module of the controller or ingress node to calculate and determine the path of the multicast traffic. 
	  Then, the controller or ingress node issues a SID (Segment Identifier) ​​for multicast operations to 
	  the replication point (i.e., the multicast replication point) in the network. 
	  When multicast traffic enters the tunnel, it is replicated and forwarded at the replication point 
	  according to the multicast operation SID. When the network topology changes, 
	  the controller or ingress node needs to recalculate and determine the replication point 
	  and issue the multicast operation SID to the changed replication point, 
	  so that subsequent multicast traffic will be forwarded through the new path.</t>
	  
      <t>BIER (Bit-Indexed Explicit Replication) <xref target="RFC8279" format="default"/> is an architecture 
      that provides optimal multicast forwarding through a "multicast domain", 
      without requiring intermediate routers to maintain any per-flow state or to engage in an explicit tree-building protocol. 
      BIER is more flexible than PIM-SM and SR P2MP. 
      When link failures or other issues occur on the multicast forwarding path, 
	  BIER can converge along with IGP convergence, a speed far exceeding that of PIM-SM and SR P2MP.</t>
      
      <t>When considering applying multicast technology to large model synchronization scenarios, 
	  if the model is synchronized to the same destination GPU clouds each time, a multicast tree can be pre-established, 
	  or the SR replication path can be calculated using the controller, and PIM-SM or SR P2MP technologies can be used for model copying.</t>

      <t>If the destination GPU clouds for each model synchronization is different, 
	  pre-establishing a multicast tree or multicast path each time using PIM-SM/SR P2MP technologies may be inefficient 
	  because multicast tree establishment takes time. 
	  In this case, using BIER technology is a better choice.</t>
    </section>
    
    <section anchor="IANA" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>There are no IANA consideration introduced by this draft.</t>
    </section>
    <section anchor="Security" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>There are no security issues introduced by this draft.</t>
    </section>
  </middle>
  <!--  *****BACK MATTER ***** -->

 <back>

   <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <?rfc include="reference.RFC.2119.xml"?>    
        <?rfc include="reference.RFC.7761.xml"?>
        <?rfc include="reference.RFC.8279.xml"?>
      </references>
      <references title="Informative References">
        <?rfc include="reference.I-D.ietf-pim-sr-p2mp-policy.xml"?>
    </references>
    </references>
 </back>
</rfc>
