<?xml version='1.0' encoding='UTF-8'?>
<rfc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" category="info" docName="draft-grimminck-safe-ioc-sharing-07" ipr="trust200902" xml:lang="en" version="3" submissionType="independent">
  <front>
    <title abbrev="Safe-IOC Sharing">A Standard for Safe and Reversible Sharing of Malicious URLs and Indicators</title>
    <author fullname="Stefan Grimminck" role="editor">
      <address>
        <email>ietf@stefangrimminck.nl</email>
      </address>
    </author>
    <date year="2026" month="March" day="31"/>
    <abstract>
      <t>This document codifies a consistent and reversible convention used in the threat intelligence and security communities for sharing potentially malicious indicators of compromise (IOCs), such as URLs, IP addresses, email addresses, and domain names. It describes a safe obfuscation format that reduces the risk of accidental execution or activation when IOCs are displayed or transmitted. These conventions aim to improve interoperability among tools and feeds that exchange threat intelligence data.</t>
    </abstract>
  </front>
  <middle>
    <section title="Introduction">
      <t>The secure sharing of malicious artifacts is vital to threat intelligence, open-source intelligence (OSINT), and incident response. However, sharing raw URLs, IP addresses, and email addresses associated with malware or threat actors poses a risk of accidental activation.</t>
      <t>Participants who routinely share indicators of compromise (IOCs) include security operations center (SOC) analysts, computer emergency response teams (CERTs), OSINT researchers, incident responders, and vendors of threat intelligence platforms and feeds. IOCs appear in email threads, instant-messaging channels, ticketing systems, PDF and HTML reports, blog posts, paste sites, and machine-readable formats such as STIX/TAXII. Both human readers and automated pipelines consume this material.</t>
      <t>When a raw URI such as "https://malicious-host.example/path" is embedded in those channels, many systems automatically detect it and render it as a clickable or otherwise actionable link. An analyst may then activate the resource unintentionally: navigating to an attacker-controlled URI can reveal the analyst's IP address and organizational affiliation, trigger delivery of malware, or alert the threat actor that a particular indicator is under active investigation. Some mail and web infrastructure pre-fetches or resolves links for scanning or preview purposes, producing the same exposure without any deliberate user action. PDF viewers and rich-text editors may turn strings that resemble URIs into hyperlinks even when the author intended plain text.</t>
      <t>A longstanding, well-established practice in the security community is to alter IOCs so that they remain human-readable but are not treated as live URIs by typical software: for example, replacing "http" with "hxxp" and "." with "[.]". Many variant spellings exist (e.g., "h**p", different bracketing conventions for dots). That inconsistency hinders reliable parsing, exchange, and automation. This document codifies the most widely adopted spellings and a canonical order of operations so that independent implementations can interoperate. It does not introduce new URI schemes.</t>
      <t>The character sequences used in this convention (e.g., "hxxp", "hxxps", "fxp") are not URI schemes as defined in Section 3.1 of <xref target="RFC3986"/>. They are not registered in the IANA URI Schemes registry. This document does not recommend or endorse the practice of placing unregistered strings in the syntactic position of a URI scheme; new specifications SHOULD NOT define such strings. Implementations MUST NOT treat the sequences documented here as resolvable URI schemes.</t>
      <t>This document records an existing operational convention that arose organically in the threat intelligence community and is already in widespread use. The convention predates this standardization effort. The purpose of this document is to improve interoperability among implementations that already follow this convention, not to advocate for its pattern of URI scheme substitution.</t>
      <t>The authors acknowledge that strings occupying the same syntactic position as a URI scheme create a potential for confusion with the IANA URI Schemes registry. This document SHOULD NOT be cited as precedent for other specifications to place unregistered strings in the URI scheme namespace. The security implications of this namespace overlap, including the possibility of a future registry collision, are discussed in <xref target="security"/>.</t>
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they appear in all capitals, as shown here.</t>
    </section>

    <section title="Terminology">
      <t><strong>Obfuscating:</strong> The process of altering an indicator so that it cannot be accidentally activated or clicked. The goal is to prevent automatic execution or resolution, not to conceal the content from human readers; the original indicator remains visually recognizable.</t>
      <t><strong>De-obfuscating:</strong> The process of restoring an obfuscated indicator to its original, actionable form.</t>
      <t><strong>IOC:</strong> Indicator of Compromise - data such as a URL, IP address, domain name, email address, or hash associated with malicious activity.</t>
    </section>

    <section title="Problem Statement">
      <t>Inconsistent obfuscation practices hinder the reliable and automated exchange of threat intelligence. For example:</t>
      <ul>
        <li>A URL obfuscated as "h**p://example[.]example" cannot be reliably parsed by tools expecting "hxxp://example[.]example".</li>
        <li>An IP address obfuscated with parentheses (e.g., "192.0.2(.)1") may fail to de-obfuscate in systems expecting "[.]".</li>
      </ul>
      <t>Such inconsistencies reduce the effectiveness of threat detection and response.</t>
    </section>

    <section title="Canonical Transformation Rule" anchor="canonical">
      <t>To prevent nested obfuscation (e.g., "hxxps://example[[.]]example") when an LLM or tool processes the same string twice or in the wrong order, implementations MUST apply transformations in the following strict order of operations. Implementations MUST treat already-obfuscated substrings (e.g., "[.]", "[@]") as opaque and MUST NOT apply transformations to them again; thus, the transformation is idempotent. Using encoded characters (such as %2e for ".") SHOULD be avoided to prevent ambiguity.</t>

      <section title="Step 1: Scheme">
        <t>Identify and replace the scheme first. The following mappings are the canonical set documented by this specification:</t>
        <table>
          <thead>
            <tr><th>Original Scheme</th><th>Obfuscated Form</th></tr>
          </thead>
          <tbody>
            <tr><td>http</td><td>hxxp</td></tr>
            <tr><td>https</td><td>hxxps</td></tr>
            <tr><td>ftp</td><td>fxp</td></tr>
          </tbody>
        </table>
        <t>These are specific, well-known tokens adopted by convention in the threat intelligence community. Schemes not listed above (e.g., "sftp", "telnet", "ftps") do not yet have widely established obfuscated forms. Implementations encountering such schemes SHOULD leave them unmodified and rely on host-level obfuscation (Steps 2-3) to prevent accidental activation.</t>
      </section>

      <section title="Step 2: Userinfo">
        <t>Identify the "@" symbol in the userinfo subcomponent (per <xref target="RFC3986"/>) and replace it with "[@]". This applies to email addresses and URIs containing userinfo (e.g., "username:password@host").</t>
      </section>

      <section title="Step 3: Host">
        <t>Replace all "." (period) characters in the Host subcomponent with "[.]". This applies to domain names and IPv4 addresses, including standalone values (e.g., "evil.example" or "198.51.100.1" without a scheme). IPv6 addresses enclosed in square brackets (e.g., "[2001:db8::1]") MUST retain their colon-based syntax and brackets; do not alter colons or brackets within the IPv6 literal.</t>
      </section>

      <section title="Step 4: Stop">
        <t>Do not process the Path, Query, or Fragment components unless they contain nested URIs that require separate obfuscation. Applying transformations beyond the Host in the primary URI may cause incorrect results.</t>
      </section>
    </section>

    <section title="Formal ABNF Grammar" anchor="abnf">
      <t>The following uses Augmented BNF (ABNF) per <xref target="RFC5234"/> to illustrate tokens that commonly appear in obfuscated IOC strings. The rules are not an exhaustive registry of every scheme observed in the field; they document the widely used HTTP(S) and FTP forms and the bracketed delimiters. An implementation MAY use this grammar to help validate whether a string is already obfuscated or still requires processing.</t>
      <figure anchor="safe-ioc-abnf">
        <sourcecode type="abnf" name="safe-ioc-abnf"><![CDATA[; Illustrative Safe-IOC tokens (not exhaustive)
safe-scheme   = "hxxp" / "hxxps" / "fxp"
safe-dot      = "[" "." "]"
safe-at       = "[" "@" "]"]]></sourcecode>
      </figure>
      <t>A compliant implementation MUST recognize strings containing safe-scheme, safe-dot, and safe-at as obfuscated when those tokens appear in the roles described in this document. A string that requires obfuscation is one that contains literal "http", "https", "." in host/domain contexts, or "@" in userinfo/email contexts without the Safe-IOC bracketing.</t>
    </section>

    <section title="De-obfuscation Techniques" anchor="deobfuscation">
      <t>Tools designed to ingest obfuscated data SHOULD automatically reverse these transformations in a deterministic manner:</t>
      <ul>
        <li>Convert "hxxps" back to "https".</li>
        <li>Convert "hxxp" back to "http".</li>
        <li>Convert "fxp" back to "ftp".</li>
        <li>Convert "[.]" back to ".".</li>
        <li>Convert "[@]" back to "@".</li>
      </ul>
      <t>Longer scheme tokens MUST be reversed before shorter prefixes that are substrings of them (e.g., reverse "hxxps" before "hxxp"). De-obfuscation MUST maintain the original semantics of the data to avoid misinterpretation.</t>

      <section title="Safety Check for Reversibility">
        <t>De-obfuscation MUST only be performed when the output is written to a non-executable buffer (e.g., a variable, string, or file) that cannot be automatically interpreted, executed, or rendered as a clickable link by the system or application. The tool MUST NOT de-obfuscate a string if it is currently being rendered in a "live" environment (e.g., a web browser preview, an active document viewer, or any context where the resulting string could be automatically executed, resolved, or displayed as a clickable link).</t>
        <t>De-obfuscation SHOULD only occur in controlled contexts such as:</t>
        <ul>
          <li>Command-line tools with explicit user confirmation</li>
          <li>Isolated analysis environments (sandboxes)</li>
          <li>Backend processing pipelines that do not render output to users</li>
        </ul>
        <t>Accidental activation during the de-obfuscation process poses a security risk and MUST be prevented.</t>
      </section>
    </section>

    <section title="Example Use Cases">
      <t>Common scenarios include:</t>
      <ul>
        <li><strong>OSINT Sharing:</strong> A report lists obfuscated URLs (e.g., "hxxp://malware[.]example/payload") to prevent accidental clicks.</li>
        <li><strong>Email Communication:</strong> Security teams share obfuscated IOCs like "attacker[@]example[.]example" in email threads.</li>
        <li><strong>Threat Intelligence Platforms:</strong> Automated ingestion of obfuscated IPs (e.g., "192[.]0[.]2[.]1") for blocklist updates.</li>
      </ul>
    </section>

    <section title="Implementation Guidance">
      <t>Software designed to parse threat intelligence feeds should explicitly support these obfuscation and de-obfuscation conventions. Implementations SHOULD verify correct behavior through unit tests and validation scripts using the test vectors in <xref target="test-vectors"/>.</t>
    </section>

    <section title="Edge Cases and Special Handling">
      <t><strong>Internationalized Domain Names (IDNs):</strong> Obfuscate punycode domains similarly (e.g., "xn--n3h[.]example").</t>
      <t><strong>Non-Standard URI Schemes:</strong> Schemes not listed in the canonical mapping table in Step 1 of <xref target="canonical"/> SHOULD be left unmodified. Implementations SHOULD rely on host-level obfuscation (Steps 2-3) to prevent accidental activation for such schemes.</t>
      <t><strong>IPv6 Literals in URIs:</strong> Do not alter colon characters (":") or brackets ("[", "]") in IPv6 addresses. For example, "[2001:db8::1]" MUST remain unchanged. Only scheme names or domain elements surrounding them should be obfuscated.</t>
    </section>

    <section title="Test Vectors" anchor="test-vectors">
      <t>The following provides a "golden set" of inputs and expected outputs. Domain names use the "example" reserved space per <xref target="RFC2606"/>; IPv4 addresses use documentation ranges per <xref target="RFC5737"/>; IPv6 addresses use the documentation prefix per <xref target="RFC3849"/>. Implementations SHOULD use these vectors to ensure correct behavior and to avoid under-obfuscation (e.g., missing email addresses) or over-obfuscation (e.g., obfuscating IPv6 colons).</t>
      <ul>
        <li>Standard URL: https://bad.example -&gt; hxxps://bad[.]example</li>
        <li>URL with path: https://evil.example/path -&gt; hxxps://evil[.]example/path</li>
        <li>Deep-link URL: https://bad.example/path/to/page?q=1#frag -&gt; hxxps://bad[.]example/path/to/page?q=1#frag</li>
        <li>HTTP URL: http://attacker.example -&gt; hxxp://attacker[.]example</li>
        <li>FTP URL: ftp://files.example/ -&gt; fxp://files[.]example/</li>
        <li>IPv4 address: 198.51.100.1 -&gt; 198[.]51[.]100[.]1</li>
        <li>IPv4 in URL: http://192.0.2.1 -&gt; hxxp://192[.]0[.]2[.]1</li>
        <li>IPv6 in URL: http://[2001:db8::1]:8080 -&gt; hxxp://[2001:db8::1]:8080</li>
        <li>IPv4-mapped IPv6: http://[::ffff:192.0.2.1] -&gt; hxxp://[::ffff:192.0.2.1]</li>
        <li>Email address: phish@target.example -&gt; phish[@]target[.]example</li>
        <li>Punycode domain: xn--n3h.example -&gt; xn--n3h[.]example</li>
        <li>URL with userinfo: http://user:pass@attacker.example -&gt; hxxp://user:pass[@]attacker[.]example</li>
        <li>Idempotency check: hxxps://bad[.]example -&gt; hxxps://bad[.]example</li>
      </ul>
      <t>Note: The IPv6 rows demonstrate that colons and brackets within the IPv6 literal MUST NOT be altered, including IPv4-mapped IPv6 (::ffff:192.0.2.1). The deep-link row shows that Path, Query, and Fragment (per Step 4) are not processed. The Punycode row shows that IDN labels in punycode form receive the same "[.]" treatment as regular domain labels. The idempotency row confirms that applying the transformation to an already-obfuscated string produces no change.</t>
    </section>

    <section title="Security Considerations" anchor="security">
      <t>While these obfuscation techniques reduce the risk of accidental activation of malicious indicators, obfuscated data SHOULD always be handled with caution.</t>

      <section title="Relationship to the URI Scheme Registry">
        <t>As noted in the Introduction, the obfuscated scheme prefixes documented here are not URI schemes and MUST NOT be treated as valid or resolvable URI schemes by generic URI parsers. Software that encounters these strings outside of a threat intelligence context MUST NOT attempt to resolve or dereference them.</t>
        <t>If a future assignment in the IANA URI Schemes registry were to collide with one of these strings, implementations would need to disambiguate by context (IOC obfuscation versus a registered scheme). Implementers should be aware of this possibility and design parsers to consider the surrounding context when interpreting these tokens.</t>
      </section>

      <section title="Partial Obfuscation">
        <t>When the original scheme appears in the canonical mapping table (<xref target="canonical"/>), a compliant tool MUST obfuscate both the scheme and the host-level delimiters (periods, at-sign). For schemes not listed in that table, host-level obfuscation alone (Steps 2-3) is sufficient, because no canonical obfuscated scheme form exists. Partial obfuscation of a listed scheme - for example, replacing only "." with "[.]" while leaving "https" unchanged - creates a false sense of security. A user may incorrectly assume a URL is safe because the period is bracketed, when the scheme remains active and could still trigger automatic linkification or execution in some environments. Implementations MUST NOT produce partially obfuscated output when full obfuscation is intended.</t>
      </section>

      <section title="Parser Confusion">
        <t>Implementations that parse Safe-IOC strings may become confused by malformed or inconsistently obfuscated input. For example, "hxxps://example.example" (scheme obfuscated but dots not) or "https://example[.]example" (dots obfuscated but scheme not) are not valid Safe-IOC formats. Parsers SHOULD validate that obfuscated strings conform to the canonical transformation rule and the illustrative ABNF before de-obfuscation. Rejecting or flagging ambiguous input reduces the risk of misinterpretation.</t>
      </section>

      <section title="De-obfuscation in Non-Executable Contexts">
        <t>As stated in <xref target="deobfuscation"/>, de-obfuscation MUST only occur when the result is placed in a non-executable buffer. A non-executable buffer is one that cannot be automatically interpreted by the system (e.g., as a URI to fetch, a command to run, or a link to display). Writing de-obfuscated output into a live document, rich-text editor, or browser address bar before explicit user action creates an unacceptable risk of accidental activation.</t>
      </section>

      <section title="Additional Considerations">
        <ul>
          <li>Implementations that do not follow the canonical transformation rule (e.g., by not treating "[.]" and "[@]" as opaque) MAY produce nested or non-reversible output when obfuscation is applied repeatedly. Compliant implementations avoid this by design.</li>
          <li>Obfuscated URLs in PDFs may still be rendered as hyperlinks; use plain-text formatting.</li>
          <li>Systems processing obfuscated indicators MUST treat them as potentially harmful data, applying sandboxing or isolated environments for analysis.</li>
          <li>Credentials (e.g., <em>username:password</em>) SHOULD NOT be shared, even in obfuscated form, due to inherent security risks.</li>
        </ul>
      </section>
    </section>

    <section title="IANA Considerations">
      <t>This document has no IANA actions.</t>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <reference anchor="RFC2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author initials="S." surname="Bradner" fullname="Scott Bradner"/>
            <date year="1997" month="March"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
        </reference>
        <reference anchor="RFC3986">
          <front>
            <title>Uniform Resource Identifier (URI): Generic Syntax</title>
            <author initials="T." surname="Berners-Lee" fullname="Tim Berners-Lee"/>
            <author initials="R." surname="Fielding" fullname="Roy T. Fielding"/>
            <author initials="L." surname="Masinter" fullname="Larry Masinter"/>
            <date year="2005" month="January"/>
          </front>
          <seriesInfo name="STD" value="66"/>
          <seriesInfo name="RFC" value="3986"/>
        </reference>
        <reference anchor="RFC5234">
          <front>
            <title>Augmented BNF for Syntax Specifications: ABNF</title>
            <author initials="D." surname="Crocker" fullname="Dave Crocker" role="editor"/>
            <author initials="P." surname="Overell" fullname="Paul Overell"/>
            <date year="2008" month="January"/>
          </front>
          <seriesInfo name="STD" value="68"/>
          <seriesInfo name="RFC" value="5234"/>
        </reference>
      </references>
      <references>
        <name>Informative References</name>
        <reference anchor="RFC2606">
          <front>
            <title>Reserved Top Level DNS Names</title>
            <author initials="D." surname="Eastlake" fullname="Donald E. Eastlake 3rd"/>
            <author initials="A." surname="Panitz" fullname="Aliza R. Panitz"/>
            <date year="1999" month="June"/>
          </front>
          <seriesInfo name="BCP" value="32"/>
          <seriesInfo name="RFC" value="2606"/>
        </reference>
        <reference anchor="RFC3849">
          <front>
            <title>IPv6 Address Prefix Reserved for Documentation</title>
            <author initials="G." surname="Huston" fullname="Geoff Huston"/>
            <author initials="A." surname="Lord" fullname="Anne Lord"/>
            <author initials="P." surname="Smith" fullname="Philip Smith"/>
            <date year="2004" month="July"/>
          </front>
          <seriesInfo name="RFC" value="3849"/>
        </reference>
        <reference anchor="RFC5737">
          <front>
            <title>IPv4 Address Blocks Reserved for Documentation</title>
            <author initials="J." surname="Arkko" fullname="Jari Arkko"/>
            <author initials="M." surname="Cotton" fullname="Michelle Cotton"/>
            <author initials="L." surname="Vegoda" fullname="Leo Vegoda"/>
            <date year="2010" month="January"/>
          </front>
          <seriesInfo name="RFC" value="5737"/>
        </reference>
        <reference anchor="RFC8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author initials="B." surname="Leiba" fullname="Barry Leiba"/>
            <date year="2017" month="May"/>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
        </reference>
      </references>
    </references>
  </back>
</rfc>
