<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- generated by https://github.com/cabo/kramdown-rfc version 1.6.39 (Ruby 3.0.2) -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-fu-sidrops-rpki-repositories-monitoring-00" category="std" consensus="true" submissionType="IETF" version="3">
  <!-- xml2rfc v2v3 conversion 3.18.0 -->
  <front>
    <title abbrev="Monitoing Repositories Health and Safety">Operational Monitoring of RPKI Repositories Health and Safety</title>
    <!--  [REPLACE/DELETE] abbrev. The abbreviated title is required if the full title is longer than 39 characters -->

    <seriesInfo name="Internet-Draft" value="draft-fu-sidrops-rpki-repositories-monitoring-00"/>
   

    <author initials="Y." surname="Fu" fullname="Yonghong Fu">
        <organization>Zhongguancun Laboratory</organization>
        <address>
          <postal>
            <city>Beijing</city>
            <country>China</country>
          </postal>
          <email>fuyh@mail.zgclab.edu.cn</email>
        </address>
      </author> 
    
    <author initials="M." surname="Xu" fullname="Mingwei Xu">
        <organization>Tsinghua</organization>
        <address>
          <postal>
            <city>Beijing</city>
            <country>China</country>
          </postal>
          <email>xmw@cernet.edu.cn</email>
        </address>
      </author>
    
    <author initials="Y." surname="Wang" fullname="Yangyang Wang">
        <organization>Tsinghua University</organization>
        <address>
          <postal>
            <city>Beijing</city>
            <country>China</country>
          </postal>
          <email>wyy@cernet.edu.cn</email>
        </address>
      </author>
    
    <author initials="J." surname="Zhang" fullname="Jia Zhang">
        <organization>Zhongguancun Laboratory</organization>
        <address>
          <postal>
            <city>Beijing</city>
            <country>China</country>
          </postal>
          <email>zhangj@mail.zgclab.edu.cn</email>
        </address>
      </author>

    <author initials="Y." surname="Zhang" fullname="Yuanyuan Zhang">
        <organization>Zhongguancun Laboratory</organization>
        <address>
          <postal>
            <city>Beijing</city>
            <country>China</country>
          </postal>
          <email>zhangyy@zgclab.edu.cn</email>
        </address>
      </author>
    
    
    
    



    <date year="2026"/>
    <!-- On draft subbmission:
         * If only the current year is specified, the current day and month will be used.
         * If the month and year are both specified and are the current ones, the current day will
           be used
         * If the year is not the current one, it is necessary to specify at least a month and day="1" will be used.
    -->

    <area>Operations and Management Area (OPS)</area>
    <workgroup>SIDR Operations Working Group</workgroup>
    <!-- "Internet Engineering Task Force" is fine for individual submissions.  If this element is 
          not present, the default is "Network Working Group", which is used by the RFC Editor as 
          a nod to the history of the RFC Series. -->

    <keyword>RPKI</keyword>
    <keyword>Monitoring</keyword>
    <!-- [REPLACE/DELETE]. Multiple allowed.  Keywords are incorporated into HTML output files for 
         use by search engines. -->

    <abstract>
      <t> The Resource Public Key Infrastructure (RPKI) relies on a globally distributed set of repositories to deliver signed routing authorization data to Relying Parties (RPs). Internet Service Providers (ISPs) depend on RPs to collect RPKI objects from distributed repositories and validate them cryptographically, resulting in hundreds of thousands of Validated Route origin authorization Payloads (VRPs). Nevertheless, even with multiple RPs deployed, ISPs have limited insight into the operational health and reliability of each repository. When a large number of ROAs suddenly change from valid to unknown or invalid, operators often lack sufficient information to diagnose the cause, which may stem from an outage or instability in a specific repository. Consequently, ISPs cannot easily determine whether these changes are caused by routine updates, malicious behavior, or underlying repository instability. </t>
	  
	  <t>Consequently, ISPs cannot easily determine whether these changes are caused by routine updates, malicious behavior, or underlying repository instability. This document provides operational guidance for monitoring the health and safety of RPKI repositories on a per-repository basis. It defines measurable indicators related to reachability, availability, and content integrity, and explains how these metrics can be used to detect degraded performance or potentially unsafe repository behavior. The document discusses and provides recommendations for repositories alerting and operational response. The goal is to improve the transparency, operational availability and security of the RPKI ecosystem. </t>
    </abstract>
 
  </front>

  <middle>
    
    <section>
      <name>Introduction</name>
		<t>The Resource Public Key Infrastructure (RPKI) architecture is described in <xref target="RFC6480"/>. It defines a framework that represents the allocation hierarchy of IP address space and Autonomous System (AS) numbers, as well as a distributed repository system for the storage and dissemination of the signed objects used to improve routing security. Internet Service Providers (ISPs) and other participants rely on Relying Parties (RPs) to retrieve and validate this published information from the repositories. RP uses rsync protocol and RPKI Repository Delta Protocol (RRDP) protocol for efficient synchronization of repository contents. The rsync protocol and RPKI Repository Delta Protocol (RRDP) are described in <xref target="RFC5718"/> and <xref target="RFC8182"/>. An operational best current practices for deployment and management of an RPKI Publication Server is described in <xref target="I-D.ietf-sidrops-publication-server-bcp-profile"/>. </t>
		
      
      <section>
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 <xref target="RFC2119"/>
          <xref target="RFC8174"/> when, and only when, they appear in all capitals, as shown here.</t>
      </section>
      <!-- [CHECK] The 'Requirements Language' section is optional -->
    </section>
    
  <section>
    <name>Problem Statement</name>
    <t>
      The RPKI infrastructure consists of a large and growing number of independently operated repositories distributed across multiple networks, organizations, and geographic regions. Internet Service Providers (ISPs) depend on Relying Parties (RPs) to collect RPKI objects from the  distributed repositories and validate them cryptographically, resulting in hundreds of thousands of Validated ROA Payloads (VRPs). </t>
	
	<t>
	However, even with multiple RPs deployed, ISPs have limited insight into the operational health and reliability of each repository. Because RPs generally treat all repositories uniformly and do not maintain a persistent behavioral profile for each repository. When a large number of ROAs suddenly change from valid to unknown or invalid, operators often lack sufficient information to diagnose the cause, which may stem from an outage or instability in a specific repository. Meanwhile, not all repositories are well maintained—some are unreachable, and others contain outdated objects. As a result, ISPs lack clear visibility into the status of each repository.
	</t>
	
	<t>
	At present, ISPs lack the ability to distinguish whether changes in RPKI objects are due to routine updates, malicious behavior, or systemic issues within the repositories. In the absence of consistent per-repository monitoring and operational visibility, operators face significant challenges in identifying degraded repositories, correlating incidents across networks, and proactively detecting emerging risks. 
	</t>
	
	<t>
	This document seeks to address these gaps. It provides operational guidance for monitoring the health and safety of RPKI repositories on a per-repository basis. It identifies measurable indicators related to reachability, availability, content integrity. It describes how these indicators can be used to detect degraded or unsafe repository behavior. The document discusses and provides recommendations for repositories alerting and operational response. The goal is to improve the transparency, operational availability and security of the RPKI ecosystem.
	</t>
	
  </section>
  

<section anchor="system-model" numbered="true" toc="include">
	<name> Metric Model</name>
	
	<section anchor="overview">
		<name> Overview </name>
		<t> This section summarizes all metrics defined in this document. </t>
		<t> Metrics are divided into three classes:</t>
		<t> Base Counters: Primitive observable events. Used for diagnostics and as inputs to derived metrics.</t>
		<t> Health Indicators:Ratios or computed values representing instantaneous repository correctness and usability. These indicators SHOULD be used for alerting.</t>
		<t> State-Change (Churn) Indicators: Metrics representing differences between successive repository snapshots. These indicators detect abnormal or unexpected publication behavior over time.</t>
		<t> Monitoring systems:</t>
		<t> MUST implement Base Counters,</t>
		<t> MUST compute Health Indicators,</t>
		<t> SHOULD compute State-Change Indicators.</t>
	</section>
	
	<section anchor="observation-window">
		<name> Observation Window </name>
		<t> 
		Indicators SHOULD be computed over a configurable time window. Windows MAY be sliding or tumbling. Implementations SHOULD document the window duration. </t>
	</section>
</section>	

 

  <section>
    <name>Base Counters</name>
    <t>Counters defined in this section are per repository and per transport unless otherwise stated.</t>

	<section>
    <name>Transport Counters</name>
    <t> attempted_connections: Number of connection attempts initiated.</t>
	<t> successful_connections: Number of successful connections.</t>
	<t> failed_connections: Number of unsuccessful connections.</t>
	<t> successful_dns_resolutions: Number of successful DNS queries.</t>
	<t> total_dns_queries: Number of total DNS queries.</t>
    </section>
	
	<section>
    <name>Synchronization Counters</name>
    <t> attempted_syncs: Number of synchronization attempts.</t>
	<t> successful_syncs: Number of synchronization attempts completed without error.</t>
	<t> failed_syncs: Number of synchronization attempts that failed.</t>
    </section>
	
	<section>
    <name>Object Retrieval Counters</name>
    <t> attempted_object_fetches: Number of objects such as ROA, Certificates,manifest, CRL etc.</t>
	<t> successful_object_fetches: Number of objects download successful.</t>
	<t> failed_object_fetches: Number of objects downloaded failed.</t>
    </section>
	
	<section>
    <name>Validation Counters</name>
    <t> total_objects: Number of total objects, such as ROA, certificates. </t>
	<t> valid_objects: Number of valid objects.</t>
	<t> invalid_objects: Number of invalid objects.</t>
	<t> referenced_objects: Number of objects in manifest file.</t>
	<t> present_referenced_objects: The actual downloaded objects.</t>
    </section>
	
	<section>
    <name>Repository Update Time</name>
    <t> observed_repository_update: Observed repository update time. </t>
    </section>
</section>	

<section>
    <name>Derived Health Indicators</name>
    <t>The indicators defined in this section measure the instantaneous operational health of a repository, including reachability, availability, and integrity.</t>

    <section>
    <name>Reachability Indicators</name>
		<section>
		<name>Transport Reachability Ratio (TRR)</name>
		<t> TRR = successful_connections / attempted_connections </t>
		<t> Measures probability that the repository endpoint can be contacted. </t>
		</section>
		<section>
		<name>DNS Resolution Success Rate (DRSR)</name>
		<t> DRSR = successful_dns_resolutions / total_dns_queries </t>
		<t> Detects DNS-related failures. </t>
		</section>
    </section>
	
	<section>
    <name>Availability Indicators</name>
		<section>
		<name>Fetch Success Ratio (FSR)</name>
		<t> FSR = successful_object_fetches / attempted_object_fetches </t>
		<t> Measures reliability of object delivery. </t>
		</section>
		<section>
		<name>Synchronization Success Ratio (SSR)</name>
		<t> SSR = successful_syncs / attempted_syncs </t>
		<t> Measures probability that a complete update can be obtained. </t>
		<t> Persistent low values indicate degraded availability. </t>
		</section>
		<section>
		<name>Update Freshness (UF)</name>
		<t> UF = now − last_observed_repository_update </t>
		<t> UF measures repository staleness and is time-based rather than a ratio. </t>
		</section>
    </section>
	
	<section>
    <name>Content Integrity Indicators</name>
		<section>
		<name>Validation Success Ratio (VSR)</name>
		<t> VSR = valid_objects / total_objects </t>
		<t> Indicates cryptographic and syntactic validity. </t>
		</section>
		<section>
		<name>Object Consistency Ratio (OCR)</name>
		<t> OCR = present_referenced_objects / referenced_objects </t>
		<t> referenced_objects= files the manifest says must exist </t>
		<t> present_referenced_objects= files actually download successfully </t>
		</section>
		<section>
		<name>Hash Mismatch Rate (HMR)</name>
		<t> HMR = hash_mismatches / hash_verifications </t>
		<t> Non-zero values MUST be treated as critical integrity failures. </t>
		</section>
    </section>
	
	
	<section>
    <name>Alerting Guidance</name>
	<t> Monitoring systems SHOULD generate alerts when TRR, SRR, FSR,OCR, HMR falls below a configured threshold value.  </t>
    </section>
</section>


<section>
    <name>State-Change and Churn Indicators</name>
	
    <section>
    <name>Overview</name>
	 <t>A repository MAY remain fully reachable and internally consistent while exhibiting abnormal or unsafe publication behavior. Examples include: sudden bulk withdrawal of ROAs, excessive object churn, incomplete or partially applied updates. Such events can materially affect routing outcomes even when health indicators remain nominal. </t>
	 <t>To detect these conditions, monitoring systems should evaluate state-change indicators—also known as churn indicators—that measure the differences between consecutive repository states. When the change in these indicators exceeds an ISP-configured threshold, the monitoring system sends an alarm. </t>
	 <t>These indicators provide temporal visibility and enable detection of unexpected or anomalous repository behavior. </t>
	 </section>
	 
	<section>
	<name>Snapshot Model</name>
	<t> After each successful synchronization, a monitoring system SHOULD construct a repository snapshot containing at least: </t>
	<t> validated object identifiers, </t>
	<t> object hashes, </t>
	<t> object types (ROA, certificate, CRL, manifest, etc.), </t>
	<t> RRDP session identifiers and serial numbers. </t>
	<t> Change indicators are computed by comparing the current snapshot with the most recent prior successful snapshot. </t>
	</section>
		
	<section>
	<name>General Object Churn</name>
		<section>
		<name>Object Change Count (OCC)</name>
		<t> OCC = added_objects + removed_objects + modified_objects </t>
		<t> added_objects are objects newly observed, </t>
		<t> removed_objects are previously observed objects no longer present, </t>
		<t> modified_objects are objects whose content hash has changed. </t>
		<t> OCC provides an absolute measure of repository churn. </t>
		</section>
			
		<section>
		<name> Object Change Ratio (OCRate) </name>
		<t> OCRate = OCC / previous_total_objects </t>
		<t> This indicator normalizes churn by repository size and enables comparison across repositories. </t>
		<t> Large values MAY indicate: bulk re-publication, tooling errors, storage faults, or abnormal behavior. </t>
		<t> Monitoring systems SHOULD track historical baselines for this value. </t>
		</section>
	</section>
		
	<section>
	<name> ROA Stability Indicators</name>
	<t> Because ROAs directly affect route validation outcomes, their stability is particularly important. </t>
		
		<section>
		<name> ROA Count Delta (RCD) </name>
		<t> RCD = added_roas + removed_roas + modified_roas </t>
		<t> Large negative values MAY indicate accidental withdrawal </t>
		<t> Large positive values MAY indicate bulk reissuance. </t>
		</section>
			
		<section>
		<name> ROA Change Ratio (RCR) </name>
		<t> RCR = (added_roas + removed_roas + modified_roas) / previous_roa_count </t>
		<t> Measures relative ROA churn. </t>
		<t> Persistent or sudden spikes SHOULD generate alerts. </t>
		</section>
			
		<section>
		<name> ROA Withdrawal Ratio (RWR) </name>
		<t> RWR = removed_roas / previous_roa_count </t>
		<t> Unexpectedly large withdrawal ratios exceeds the configured threshold by ISP SHOULD send an alarm. </t>
		</section>
	</section>
	
	<section>
		<name> Certificate and CA Stability Indicators </name>
			<section>
			<name> Certificate Change Ratio (CCR) </name>
			<t> CCR = (added_certs + removed_certs + modified_certs) / previous_cert_count </t>
			<t> Large values MAY indicate:key rollover, mass reissuance, misconfiguration, or abnormal behavior. </t>
			</section>
			
			<section>
			<name> Expired Object Ratio (EOR) </name>
			<t> EOR = expired_objects / total_objects </t>
			<t> Expired objects SHOULD NOT normally appear in a properly maintained repository. </t>
			<t> Values greater than zero SHOULD trigger alerts. </t>
			</section>
			
			<section>
			<name> Invalid Object Ratio (IOR) </name>
			<t> IOR = invalid_objects / total_objects </t>
			<t> Increasing IOR over time MAY indicate publication or signing defects. </t>
			</section>
	</section>		
		
	<section>
	<name> RRDP Publication Continuity</name>
		
		<section>
		<name> Serial Progression Delta (SPD) </name>
		<t> SPD = current_serial − previous_serial </t>
		<t> The SPD ≥ 0. </t>
		</section>
			
		<section>
		<name> Delta Volume (DV) </name>
		<t> DV = number_of_objects_changed_in_rrdp_delta </t>
		<t> Large deltas MAY indicate excessive churn. </t>
		</section>

	</section>
	
	<section>
    <name>Alerting Guidance</name>
	<t> Monitoring systems SHOULD generate alarms when: RCR or CCR significantly exceed historical norms,
RWR, EOR exceeds an operator-defined threshold, SPD ≤ 0 unexpectedly.  </t>
    </section>
	
</section>



    <section anchor="Security">
      <!-- All drafts are required to have a security considerations section. See RFC 3552 for a guide. -->
      <name>Security Considerations</name>
      <t>This document defines operational monitoring metrics for assessing the reachability, availability, integrity, and stability of RPKI repositories. It does not modify the RPKI trust model, cryptographic validation procedures, or protocol behavior.
      </t>
    </section>

    <section anchor="IANA">
    <!-- All drafts are required to have an IANA considerations section. See RFC 8126 for a guide.-->
      <name>IANA Considerations</name>
      <t>This document has no IANA actions</t>
    </section>
    

    
    <!-- NOTE: The Acknowledgements and Contributors sections are at the end of this template -->
  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        
        
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.6480.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.5718.xml"/>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.8182.xml"/>

        <!-- The recommended and simplest way to include a well known reference -->

      </references>
 
      <references>
        <name>Informative References</name>
        <reference anchor="I-D.ietf-sidrops-publication-server-bcp-profile" target="https://datatracker.ietf.org/doc/draft-ietf-sidrops-publication-server-bcp/05/">
        <front>
        <title> RPKI Publication Server Best Current Practices</title>
        <author initials="T." surname="Bruijnzeels" fullname="Tim Bruijnzeels">
        <organization>RIPE NCC</organization>
        </author>
        <author initials="T." surname="Kock" fullname="Ties de Kock">
        <organization>RIPE NCC</organization>
        </author>
        <author initials="F." surname="Hill" fullname="Frank Hill">
        <organization>ARIN</organization>
        </author>
        <author initials="T." surname="Harrison" fullname="Tom Harrison">
        <organization>APNIC</organization>
		</author>
        <date month="October" day="20" year="2025"/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-ietf-sidrops-publication-servere-bcp"/>
        </reference>
      </references>
    </references>
    
    
    <!--
    <section>
      <name>Appendix 1 [REPLACE/DELETE]</name>
      <t>This becomes an Appendix [REPLACE]</t>
    </section>
    -->



 <!--     <section anchor="Contributors" numbered="false">-->
      <!-- [REPLACE/DELETE] a Contributors section is optional -->
 <!--     <name>Contributors</name>
      <t>Thanks to all of the contributors. [REPLACE]</t>-->
      <!-- [CHECK] it is optional to add a <contact> record for some or all contributors -->
  <!--   </section>
-->


 </back>
</rfc>
