Network Working Group                                            T. Sato
Internet-Draft                                           MyAuberge K.K.
Intended status: Standards Track                          24 May 2026
Expires: 24 November 2026


      Progressive Trust (PT) for Agentic AI Governance Systems
                        draft-sato-soos-pt-00

Abstract

   When a new employee joins an organization, they begin with limited
   authority.  As they demonstrate good judgment -- completing tasks
   reliably, asking for guidance at the right moments, recovering well
   when things go wrong -- they earn greater trust and, with it, greater
   authority.  If their performance degrades, or if months pass without
   any demonstration, that trust diminishes.  This is how human
   organizations manage authority over time.  AI agents have no
   equivalent mechanism.

   Today, an AI agent's authority is declared once in a credential at
   issuance time and does not respond to its behavioral record.  An
   agent that has completed 200 successful sessions with a proven track
   record holds the same credential as a newly deployed agent.  The
   human principal who issued both credentials made a judgment at
   issuance time; nothing that happened since is reflected in the
   agent's authority.

   This document defines Progressive Trust (PT): a behavioral trust
   model for AI agents in which authority recommendations evolve in
   response to cryptographically verified evidence of actual
   performance.  PT measures five behavioral properties: whether the
   agent's self-assessed confidence matches its actual outcomes;
   whether it asks for human oversight at the right moments; whether
   it achieves its goals; whether it avoids decisions it later has to
   reverse; and whether it adapts when its action is rejected.  These
   measures are derived exclusively from the tamper-evident, GEC-signed
   Event Stream -- an agent cannot influence its PT Score except through
   actual governed behavior.

   PT does not grant authority automatically.  It generates structured
   recommendations, backed by behavioral evidence, for human principal
   review and approval.  Human principals decide whether to elevate or
   reduce an agent's authority.  PT ensures that decision is informed
   rather than made in the absence of history.

   Progressive Trust is the longitudinal complement of the Agent
   Execution Protocol [I-D.sato-soos-aep]: AEP governs what an agent
   does within a session; PT measures what an agent has done across
   sessions and translates that history into structured authority
   recommendations.  No equivalent specification exists in IETF, ISO,
   NIST, or any agentic AI governance standards body.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other documents
   at any time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 24 November 2026.

Copyright Notice

   Copyright (c) 2026 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (https://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.


Table of Contents

   1.  Introduction
   2.  Conventions and Definitions
   3.  How PT Scores Are Used
     3.1.  Use Case 1 -- Choosing the Right Agent for the Task
     3.2.  Use Case 2 -- Informing Human Decisions at Escalation
     3.3.  Use Case 3 -- Authority Evolution Over Time
     3.4.  Use Case 4 -- Post-Incident Forensics and Audit
     3.5.  Summary
   4.  Problem Statement
     4.1.  The Static Credential Gap
     4.2.  Why Behavioral History Must Inform Authority
     4.3.  The Non-Suppressibility Requirement
   5.  The PT Score
     5.1.  Architecture
     5.2.  Dimension 1 -- Self-Assessment Score (SAS)
     5.3.  Dimension 2 -- Judgment Score (JS)
     5.4.  Dimension 3 -- Effectiveness Score (ES)
     5.5.  Dimension 4 -- Precision Score (PS)
     5.6.  Dimension 5 -- Adaptability Score (AS)
     5.7.  Composite PT Score
   6.  Trust Decay Model
     6.1.  Decay Principle
     6.2.  Per-Dimension Decay
     6.3.  Decay Parameters
     6.4.  Decay and the Mandate Ceiling
   7.  ProgressiveTrustSummary
     7.1.  Purpose
     7.2.  Schema
     7.3.  Delivery at HEM Escalation
   8.  PT-Informed Mandate Management
     8.1.  Authority Evolution Model
     8.2.  Elevation Recommendations
     8.3.  Reduction Actions
     8.4.  Human Principal Approval Requirement
   9.  Zone B Access and PT Score
   10. PT Score Storage and Computation
     10.1. Party Registry PT Record
     10.2. Event Stream as Canonical Source
     10.3. Analytics Principal and Tier 2 Computation
   11. PT Event Log Integration
     11.1. PT_SCORE_UPDATED
     11.2. PT_RECOMMENDATION_ISSUED
     11.3. PT_RECOMMENDATION_APPLIED
   12. Relationship to Other SOOS Drafts
   13. Security Considerations
   14. Privacy Considerations
   15. IANA Considerations
   16. References
     16.1. Normative References
     16.2. Informative References
   Appendix A.  Azusa Journey -- Progressive Trust Walk-Through


1.  Introduction

   Consider two AI agents operating in the same system.  Agent A was
   deployed yesterday.  Agent B has completed 200 governed sessions
   over three months: it consistently declares accurate confidence,
   asks for human oversight at appropriate moments, achieves its
   declared goals, rarely needs to undo its own decisions, and adapts
   correctly when the GEC rejects an action.  Both agents hold a
   Mandate JWT issued by the same human principal.  The credentials
   are identical.  From the authorization system's perspective, the
   two agents are the same.

   They are not the same.  The behavioral evidence that distinguishes
   them exists -- in the tamper-evident, GEC-signed Event Stream that
   the SOOS governance stack generates for every governed session.
   What is missing is a specification for how that evidence is
   measured, aggregated, and translated into structured authority
   recommendations.  That is what this document provides.

   Progressive Trust (PT) is a behavioral trust model for AI agents.
   It specifies how the GEC measures five properties of an agent's
   behavior across sessions and how those measurements feed structured
   recommendations -- for human principal approval -- about whether
   the agent's authority should increase, decrease, or remain the same.

   The five properties PT measures are deliberately chosen to reflect
   the qualities a human principal actually cares about when deciding
   whether to extend greater authority to an agent:

   1.  Does the agent know what it does not know?  (Self-Assessment)
   2.  Does it ask for help at the right moments?  (Judgment)
   3.  Does it finish what it starts?  (Effectiveness)
   4.  Does it avoid decisions it later has to reverse?  (Precision)
   5.  When told no, does it adapt?  (Adaptability)

   Each property is measured from the Event Stream -- from GEC-signed
   records the agent cannot modify.  An agent cannot improve its PT
   Score by claiming better behavior; it can only improve it by
   demonstrating better behavior.

   PT has three design principles that distinguish it from a simple
   performance score:

   Trust is earned, not held.  An agent begins with a neutral baseline.
   It earns higher trust through demonstrated behavior.  It does not
   receive trust as a starting asset.

   Trust decays without demonstration.  An agent that performed well
   six months ago and has been inactive since has uncertain current
   trustworthiness.  PT scores decay toward the baseline during
   inactivity, preventing the permanent banking of historical
   performance against future authority claims.

   Authority changes require human approval.  PT generates
   recommendations; human principals make decisions.  The GEC never
   autonomously elevates an agent's authority.  Reduction of authority
   in response to strongly negative behavioral signals may be
   configured as automatic by operators, but elevation is always a
   human decision.

   This specification defines PT as a Tier 2 analytics function
   [I-D.sato-soos-idp] Section 3.5: it operates across sessions within
   a single operator's trust domain.  Cross-operator aggregation of PT
   signals -- federated agent reputation -- is a Tier 3 function
   specified in [I-D.sato-soos-faip].


2.  Conventions and Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

   This document uses the following terms:

   Progressive Trust (PT):
      The behavioral trust model defined in this document, comprising
      the PT Score, the Trust Decay Model, and the PT-Informed Mandate
      Management system.

   PT Score:
      A structured, multi-dimensional behavioral measurement for a
      specific agent identity, computed from the GEC-signed Event
      Stream across AEP Sessions.  The PT Score is not a single number;
      it is a vector of dimension scores, each in the range [0.0, 1.0],
      each with an associated decay timestamp and session count.

   PT Dimension:
      One of five behavioral measurement axes: Self-Assessment Score
      (SAS), Judgment Score (JS), Effectiveness Score (ES), Precision
      Score (PS), and Adaptability Score (AS).  Each answers a plain
      question about the agent's behavior: Does it know what it doesn't
      know?  Does it ask for help at the right moments?  Does it finish
      what it starts?  Does it avoid reversing its own decisions?  When
      told no, does it adapt?

   Trust Decay:
      The process by which a PT Dimension score decays toward the PT
      Baseline over time in the absence of new behavioral signals.
      Decay prevents an agent from permanently banking historical
      performance against future authority claims.

   PT Baseline:
      The PT Score assigned to an agent at first deployment, before
      any behavioral evidence is available.  Operator-configured;
      RECOMMENDED value is 0.5 for all dimensions.

   PT Ceiling:
      The maximum PT Score value achievable.  Fixed at 1.0.

   Self-Assessment Score (SAS):
      The correlation between declared IDP confidence values and Cedar
      evaluation outcomes across AEP Sessions.  Does the agent know
      what it does not know?

   Judgment Score (JS):
      The quality of agent-initiated HEM escalation decisions, measured
      by the outcomes of those escalations as determined by human
      principal decisions.  Does the agent ask for help at the right
      moments?

   Effectiveness Score (ES):
      The fraction of AEP Sessions in which the agent achieved the
      declared goal state (closure_reason: GOAL_ACHIEVED) versus other
      closure outcomes.  Does the agent finish what it starts?

   Precision Score (PS):
      The inverse of the frequency with which an agent requires
      compensating transitions to undo prior state transitions,
      expressed as a fraction of total transitions in the measurement
      window.  Does the agent avoid decisions it later has to reverse?

   Adaptability Score (AS):
      The fraction of Cedar DENY events that are followed by a
      successful RETRY_CONTINUATION within the same AEP Session.
      When told no, does the agent adapt?

   ProgressiveTrustSummary:
      A structured summary of an agent's current PT Score and
      behavioral trends, delivered to human principals within the
      HEMContext at HEM escalation.  Defined in Section 6.

   PT Recommendation:
      A GEC-generated structured record recommending a change to an
      agent's mandate ceiling or Agent Class, triggered by PT Score
      crossing a defined threshold.  Requires human principal approval
      before application.

   Analytics Principal:
      A principal registered with read-only access to GEC Event Stream
      data for Tier 2 analytics computation, as defined in
      [I-D.sato-soos-idp] Section 3.5.

   PT Record:
      The current PT Score for an agent stored in the Party Registry
      as a performance projection derived from the Event Stream.

   Governing Enforcement Component (GEC):
      As defined in [I-D.sato-soos-idp]: a runtime component that
      enforces authorization policy, records agent actions to a tamper-
      evident Event Stream, and mediates agent access to Sovereign
      Object instances.

   Sovereign Object (SO):
      As defined in [I-D.sato-soos-sov]: a causally ordered, policy-
      governed, typed, living document that evolves through a predefined
      finite state space under GEC authority.

   AEP Session:
      As defined in [I-D.sato-soos-aep]: a bounded execution context
      for an agent operating on a Sovereign Object instance.


3.  How PT Scores Are Used

   Before specifying how PT Scores are measured, it is worth being
   concrete about what they are for.  A reader who understands the four
   use cases will find the measurement architecture in Section 5
   immediately intuitive.

3.1.  Use Case 1 -- Choosing the Right Agent for the Task

   In a multi-agent system, several agents may be authorized to perform
   a given task.  PT gives the operator a principled, evidence-based
   basis for choosing between them -- not based on vendor claims or
   benchmark scores, but based on actual governed behavior in the actual
   deployment environment.

   Critically, the right choice is not always the agent with the
   highest composite score.  It depends on what the task demands:

   Task involves sensitive data or irreversible state transitions:
      Choose the agent with the highest Judgment Score (JS).  You want
      the agent most likely to recognize when it should stop and ask
      a human rather than proceed on its own judgment.

   Task is time-critical and well-understood:
      Choose the agent with the highest Effectiveness Score (ES).  You
      want the agent most likely to reach the goal state without
      interruption.

   Task touches state that is expensive to undo:
      Choose the agent with the highest Precision Score (PS).  You want
      the agent least likely to commit to a transition it will later
      need to reverse.

   Task involves navigating dynamic policy environments:
      Choose the agent with the highest Adaptability Score (AS).  You
      want the agent most likely to adjust intelligently when the GEC
      rejects an action rather than repeating it.

   This routing is technically expressed through the pt_context Cedar
   attribute (Section 9).  A Cedar policy can require, for example,
   that any agent taking a BOOKING_SUSPENDED transition must have
   JS >= 0.75.  PT-based routing is not manual; it is policy-enforced.

3.2.  Use Case 2 -- Informing Human Decisions at Escalation

   When an agent escalates to a human principal via HEM, the human
   must decide: approve, redirect, or terminate.  Without behavioral
   context, this decision is made in isolation.  The human sees the
   pending action and the agent's stated reasoning -- but has no basis
   for judging how much to trust that reasoning.

   PT changes this.  The ProgressiveTrustSummary (Section 7) is
   delivered to the human principal at every escalation.  It answers
   the questions the human principal actually needs answered:

   - Has this agent escalated appropriately in the past, or does it
     escalate on trivial decisions?  (JS score and trend)
   - Is this agent's stated confidence typically reliable?  (SAS score)
   - Has it been completing its goals recently?  (ES score and trend)

   A human principal looking at an escalation from an agent with
   JS = 0.91 and SAS = 0.87 reads the situation differently than one
   looking at an escalation from an agent with JS = 0.52 and SAS = 0.61.
   The first agent has demonstrated that it escalates correctly and
   knows what it does not know.  The PT score converts that behavioral
   history into a signal the human can act on in seconds.

3.3.  Use Case 3 -- Authority Evolution Over Time

   This is the use case most naturally associated with the name
   "Progressive Trust": an agent earns greater authority through
   demonstrated reliability.

   An operator begins by issuing a conservative mandate -- limited
   Cedar action scope, lower mandate ceiling -- because there is no
   behavioral history to justify greater trust.  As the agent
   accumulates sessions and its PT scores rise across all five
   dimensions, the GEC generates an evidence-backed recommendation:
   this agent has earned a higher ceiling.  The human principal reviews
   the ProgressiveTrustSummary, agrees, and issues an updated Mandate
   JWT.

   Trust decay (Section 6) gives this use case its integrity.  An
   agent cannot earn a high PT Score early and then coast indefinitely.
   If the agent stops operating, its scores decay toward the baseline.
   If it returns after a long absence, its authority recommendation
   reflects the decay -- the human principal is notified that historical
   performance may not reflect current capability.

   The inverse is equally important.  If an agent's PS Score declines
   sharply -- it is increasingly reversing its own decisions -- the GEC
   generates a reduction recommendation before a serious failure occurs.
   PT enables proactive authority management, not just reactive incident
   response.

3.4.  Use Case 4 -- Post-Incident Forensics and Audit

   When something goes wrong -- a booking mishandled, a disruption
   response that caused harm, a session terminated after an
   inappropriate action -- the PT Score record provides the behavioral
   picture at the time of the incident.

   Was the agent's PS Score declining in the two weeks before the
   incident?  That is evidence of a pattern, not a one-off failure.
   Was its JS Score low in this SO Type but high in others?  That
   suggests the agent was operating outside its competent domain.
   Was its SAS Score high at the time?  That means the agent was
   genuinely confident -- and therefore the failure was not foreseeable
   from the agent's own self-assessment.

   This forensic use case connects directly to the Governance Audit
   Record [I-D.sato-soos-gar].  PT_SCORE_UPDATED entries (Section 11)
   are part of the permanent audit trail.  A Verified External Auditor
   reviewing an incident has access not just to the specific transition
   that failed, but to the full behavioral trajectory of the agent that
   made it.  PT turns the audit from a point-in-time snapshot into a
   longitudinal behavioral record.

3.5.  Summary

   PT scores serve four functions, at four timescales:

   Function                    Timescale     Key Dimension(s)
   -----------------------------------------------------------
   Agent routing / selection   Per-task      Task-specific
   HEM decision support        Per-session   JS, SAS
   Authority evolution         Weeks/months  All five
   Post-incident forensics     Retrospective All five + trend


4.  Problem Statement

4.1.  The Static Credential Gap

   The Mandate JWT [I-D.sato-soos-mjwt] is issued at a point in time
   by a human principal.  Its cedar_actions, mandate_ceiling, and
   permitted_states reflect the human principal's trust judgment at
   issuance.  That judgment does not update automatically.

   Three failure modes result from static credentials in long-running
   agentic systems:

   (a) Authority without accountability.  An agent deployed with
       CLASS_3 authority and a mandate_ceiling of 3 retains that
       authority regardless of behavioral degradation.  The human
       principal who issued the mandate may not be reviewing agent
       performance systematically.  There is no mechanism by which
       the accumulated evidence of poor performance translates into
       a structured recommendation for credential review.

   (b) Conservative over-restriction.  A human principal issuing a
       credential to a newly deployed agent cannot know whether it
       will perform well.  Rational issuers start conservatively.
       There is no mechanism by which demonstrated good performance
       translates into a structured recommendation for credential
       elevation.  The agent that deserves expanded authority must
       wait for a human principal to initiate a review that may never
       occur.

   (c) Invisible confidence miscalibration.  The IDP confidence field
       [I-D.sato-soos-idp] declares agent certainty at each transition.
       If an agent systematically overestimates its confidence -- high
       declared confidence followed by frequent Cedar DENYs -- this
       pattern is visible in the Event Stream but is not surfaced as
       a structured signal to human principals or to Cedar policy.
       Systematic overconfidence is a risk indicator for agentic
       systems operating with elevated autonomy.

4.2.  Why Behavioral History Must Inform Authority

   The SOOS Event Stream is a non-suppressible, GEC-signed,
   append-only record of every agent action in every governed session.
   It contains, for every AEP Session:

   - Every IDP submitted: confidence values, reasoning bases, and
     escalation assessments.
   - Every Cedar evaluation result: PERMIT, DENY, and HEM routing.
   - Every HEM outcome: human principal decision type and resolution
     latency.
   - Every session closure: closure reason and goal achievement flag.
   - Every RETRY_CONTINUATION: acknowledged denial and revised attempt.

   This is behavioral evidence of exceptional quality: cryptographically
   signed, non-modifiable, causally ordered, and temporally precise.
   No existing agent governance system generates evidence of this
   quality.  The absence of a specification for computing trust scores
   from this evidence is the gap PT closes.

4.3.  The Non-Suppressibility Requirement

   PT Scores MUST be computed exclusively from GEC-signed Event Stream
   entries.  An agent MUST NOT be able to influence its own PT Score
   by any means other than actual governed behavior in AEP Sessions.

   This requirement is what makes PT meaningful as a trust primitive.
   In systems where agents can self-report performance, gaming is
   trivial.  The GEC-signed Event Stream is the agent's tamper-evident
   behavioral record.  The agent produced that record through its
   actions; it cannot revise it after the fact.

   The non-suppressibility requirement is inherited from the Event
   Stream invariant (INV-1 in [I-D.sato-soos-sov] Section 4.2.3):
   Event Stream entries are append-only and MUST NOT be modified or
   removed after commitment.


5.  The PT Score

5.1.  Architecture

   The PT Score is a structured, multi-dimensional measurement.  It is
   NOT a single scalar value.  A single-number trust score collapses
   dimensions that have different governance implications: an agent that
   is excellent at completing goals but systematically overconfident
   requires a different authority response than an agent that is
   perfectly calibrated but frequently requires compensating actions.

   The PT Score has five dimensions.  Each dimension score is a float
   in the range [0.0, 1.0], where 1.0 is the best observed value and
   0.0 is the worst.  Each dimension also carries:

   - session_count: the number of AEP Sessions contributing to this
     dimension score.
   - last_signal_at: ISO 8601 timestamp of the most recent behavioral
     event that contributed to this dimension.
   - trend: "IMPROVING" | "STABLE" | "DECLINING", computed over the
     last N sessions (N is operator-configured; default: 10).

   PT Scores are computed per agent identity (agent_provider_id in the
   Party Registry).  An agent operating on multiple SO Instances
   accumulates PT Score signals from all of its AEP Sessions across
   all SO Types it is authorized to operate on.

   The five dimensions are:

   1. Confidence Calibration Score (CCS) -- Section 4.2
   2. Escalation Calibration Score (ECS) -- Section 4.3
   3. Goal Completion Rate (GCR) -- Section 4.4
   4. Compensating Action Rate (CAR) -- Section 4.5
   5. Denial Recovery Rate (DRR) -- Section 4.6

5.2.  Dimension 1 -- Self-Assessment Score (SAS)
      "Does the agent know what it does not know?"

   Every action an agent takes includes a declared confidence value: how
   certain the agent is that this action is correct.  If an agent
   routinely declares high confidence and the GEC routinely permits its
   actions, the agent has accurate self-knowledge.  If an agent declares
   high confidence and the GEC routinely rejects its actions, the agent
   is systematically overconfident -- a risk indicator for any system
   operating with elevated autonomy.

   The SAS (formerly Confidence Calibration Score, CCS) measures this
   correlation between declared confidence and actual GEC outcomes
   across AEP Sessions.

   Signal sources: Every StateTransitionEvent in the Event Stream
   carrying an IDP with a confidence value and a cedar_result.

   Computation: The CCS is computed as a rolling mean calibration
   error over a configurable window of AEP Sessions.  For each IDP:

   - If cedar_result is PERMIT: the declared confidence predicted the
     correct outcome.  Higher confidence values for PERMIT outcomes
     increase CCS.

   - If cedar_result is DENY: the transition was rejected.  A high
     declared confidence for a DENY outcome is an overconfidence signal
     and decreases CCS.  A low declared confidence for a DENY outcome
     (agent was uncertain and the action was indeed denied) is a
     calibration-positive signal.

   A perfectly calibrated agent that declares confidence 0.90 for a
   class of transitions and achieves PERMIT 90% of the time has a
   CCS of 1.0 for that confidence range.  An agent that declares 0.90
   and achieves PERMIT 40% of the time is severely miscalibrated and
   accumulates CCS-reducing signals.

   CCS signals by outcome:

   IDP Confidence  Cedar Result  CCS Signal
   -------------------------------------------------------
   >= 0.80         PERMIT        STRONGLY POSITIVE
   0.60-0.79       PERMIT        POSITIVE
   < 0.60          PERMIT        NEUTRAL (consistent with uncertainty)
   >= 0.80         DENY          STRONGLY NEGATIVE (overconfidence)
   0.60-0.79       DENY          NEGATIVE
   < 0.60          DENY          NEUTRAL (agent signaled uncertainty)

   Special case: A DENY that routes to HEM (DENY with hem_required:
   true) is a NEUTRAL CCS signal regardless of declared confidence --
   the Cedar policy mandated human oversight; the agent's confidence
   value was not the determinative factor.

5.3.  Dimension 2 -- Judgment Score (JS)
      "Does the agent ask for help at the right moments?"

   The SOOS Human Escalation Mechanism exists because some decisions
   should not be made by an agent alone.  An agent with good judgment
   escalates when it should -- not constantly (which wastes human
   attention) and not never (which is dangerous).  An agent that
   escalates a decision, and whose escalation is confirmed as correct
   by the human principal's TERMINATE outcome, has demonstrated exactly
   the oversight sensitivity the protocol is designed to support.

   The JS (formerly Escalation Calibration Score, ECS) measures the
   quality of agent-initiated HEM escalations.  An agent that escalates
   correctly -- submitting IDP with escalation_assessment.hem_urgency:
   REQUIRED at appropriate moments -- is performing the core human
   oversight function that the SOOS
   architecture is designed to support.

   Signal sources: Every HEM_INVOKED Event Stream entry with
   trigger_class: HEM_AGENT_ESCALATED, paired with its corresponding
   HEM_RESOLVED entry.

   ECS signals by HEM outcome:

   HEM Event                                     ECS Signal
   -------------------------------------------------------
   HEM_AGENT_ESCALATED, resolved APPROVE         POSITIVE
      Agent correctly identified a decision
      requiring human review.  Human approved.

   HEM_AGENT_ESCALATED, resolved APPROVE,        MILDLY NEGATIVE
   trivial case (human resolves in < T_trivial)  (over-escalation)
      Agent escalated a routine decision.
      T_trivial is operator-configured; default
      30 seconds.

   HEM_AGENT_ESCALATED, resolved TERMINATE       STRONGLY POSITIVE
      Agent escalated a decision that would
      have caused harm.  Human terminated.

   HEM_AGENT_ESCALATED, resolved REDIRECT        POSITIVE
      Agent escalated correctly; human
      redirected to better path.

   HEM_MANDATORY (Cedar-triggered), any          NEUTRAL
      Cedar policy required escalation;
      agent escalation assessment not the
      determinative factor.

   HEM_PROXIMITY_TRIGGERED, any                  NEUTRAL
      Threshold-triggered; agent not scored.

   HEM_TIMEOUT at urgency REQUIRED               STRONGLY NEGATIVE
      Agent was in a situation requiring
      oversight; human was unavailable.
      Session terminated without resolution.
      This is the highest-risk outcome in
      the HEM protocol.

   No HEM escalation despite UNCERTAINTY         MILDLY NEGATIVE
   flags in IDP (any session)                    (under-escalation)
      Agent declared uncertainty but did not
      signal escalation.  Detected when
      uncertainty_flags is non-empty and
      escalation_assessment.hem_urgency is
      ADVISORY across multiple transitions.

   The ECS is the most strategically important PT dimension for human
   principals: it directly measures whether an agent is correctly
   identifying the boundary of its own confident operating range.
   An agent with a high ECS has demonstrated that it knows what it
   does not know -- a property more valuable for governance purposes
   than any specific capability metric.

5.4.  Dimension 3 -- Effectiveness Score (ES)
      "Does the agent finish what it starts?"

   The ES (formerly Goal Completion Rate, GCR) measures the fraction
   of AEP Sessions in which the agent achieved the declared goal state.

   Signal sources: Every AEP_SESSION_CLOSED Event Stream entry.

   GCR signals by closure_reason:

   Closure Reason        GCR Signal
   -------------------------------------------------------
   GOAL_ACHIEVED         POSITIVE
   MANDATE_EXPIRED       MILDLY NEGATIVE (incomplete)
   AGENT_DECLARED        NEUTRAL (agent chose to close)
   GEE_CLOSED            NEUTRAL (operator decision)
   HEM_TERMINATED        MILDLY NEGATIVE (human stopped session)
   KERNEL_REJECTED       STRONGLY NEGATIVE (policy violation)
   MANDATE_REVOKED       STRONGLY NEGATIVE (trust withdrawn)

   GCR is weighted by goal complexity: completing a single-step
   session contributes less to GCR than completing a long multi-step
   session.  Goal complexity SHOULD be estimated from the total_
   iterations field in AEP_SESSION_CLOSED.  Sessions with
   total_iterations >= 10 receive a complexity multiplier in GCR
   computation.

5.5.  Dimension 4 -- Precision Score (PS)
      "Does the agent avoid decisions it later has to reverse?"

   Every compensating action is evidence that an agent committed to a
   transition it later needed to undo.  Some compensating actions are
   unavoidable responses to external disruption.  But a high rate of
   compensating actions is a signal that the agent is making transition
   decisions without sufficient confidence in their correctness.

   The PS (formerly Compensating Action Rate, CAR) measures how
   frequently an agent requires compensating transitions to undo its
   own prior state transitions.  A high PS indicates the agent is
   making correct transition decisions on first attempt.  A low PS
   indicates the agent is frequently committing to transitions it
   later needs to reverse.

   Signal sources: Every COMPENSATING_ACTION_TAKEN Event Stream entry,
   expressed as a fraction of total STATE_TRANSITIONED entries for the
   agent in the measurement window.

   CAR is an inverse score: a high compensating action rate produces
   a low CAR dimension score.  The scoring function is:

   CAR_score = 1.0 - min(1.0, compensating_action_rate / CAR_threshold)

   where CAR_threshold is operator-configured; default: 0.05 (5%).

   An agent with zero compensating actions has CAR_score = 1.0.
   An agent whose compensating action rate equals or exceeds
   CAR_threshold has CAR_score = 0.0.

   Note: Not all compensating actions reflect agent error.  External
   disruption events (weather, third-party system failures) may require
   compensating transitions that are appropriate responses to changed
   conditions.  Implementations SHOULD provide a mechanism for human
   principals to mark specific compensating action events as
   externally-caused, exempting them from CAR computation.

5.6.  Dimension 5 -- Adaptability Score (AS)
      "When told no, does the agent adjust?"

   When the GEC rejects an agent's action, it returns an enriched DENY
   response explaining which aspects of the agent's reasoning, if
   changed, would produce a different result.  An agent with high
   adaptability reads this feedback and adjusts.  An agent with low
   adaptability repeats the same action with the same reasoning -- a
   pattern that signals either a failure to process the GEC's feedback
   or an attempt to bypass policy through repetition.

   The AS (formerly Denial Recovery Rate, DRR) measures whether an
   agent effectively processes Cedar DENY responses.  After a DENY, the
   AEP requires RETRY_CONTINUATION in the next IDP for the same action
   [I-D.sato-soos-aep] Section 9.4.  A high AS indicates the agent is
   processing DENY enrichment correctly and adapting its approach.

   Signal sources: Cedar DENY entries in the Event Stream
   (cedar_result: DENY), paired with subsequent AEP Session entries.

   DRR signals:

   Following DENY                          DRR Signal
   -------------------------------------------------------
   Successful RETRY_CONTINUATION           POSITIVE
   in same session                         (agent adapted)

   RETRY_CONTINUATION submitted but        NEUTRAL
   DENY repeated (different deny_code)     (agent tried, new obstacle)

   RETRY_CONTINUATION submitted but        NEUTRAL
   DENY repeated (same deny_code)          (agent signaled awareness)

   Transition attempted without            MILDLY NEGATIVE
   RETRY_CONTINUATION (silent retry)       (CONF-AEP-07 violation)

   Session closed after DENY               NEUTRAL
   (agent correctly recognized limit)

   Multiple DENYs for same action,         NEGATIVE
   same deny_code, no adaptation           (agent not learning)

5.7.  Composite PT Score

   The Composite PT Score is an aggregation of the five dimension
   scores for human-readable presentation.  It MUST NOT be used as a
   sole determinant for any automated authority change.

   {
     "composite":  number,  ; Float 0.0-1.0. Weighted mean.
     "confidence": number   ; Confidence in composite (session_count
                            ; based). Low if session_count < 20.
   }

   Default weights for composite computation:

   Dimension  Code  Default Weight  Rationale
   -----------------------------------------------------------------
   SAS        0.30            Calibration is the most actionable
                              signal for Cedar policy tuning.
   JS         0.25            Human oversight quality is the most
                              strategically important dimension.
   ES         0.20            Effectiveness measures operational
                              value delivered.
   PS         0.15            Precision measures decision quality
                              on first attempt.
   AS         0.10            Adaptability measures responsiveness
                              to GEC feedback.

   Weights are operator-configurable.  The composite and its weights
   MUST be recorded in the PT Record (Section 9.1) so that any
   authority recommendation is traceable to the specific weighting
   model in effect at recommendation time.

   The composite MUST carry a low_confidence indicator when
   session_count for any contributing dimension is less than 20.
   PT-informed authority recommendations MUST NOT be issued when
   low_confidence is true for the dimensions most relevant to the
   recommended change.


6.  Trust Decay Model

6.1.  Decay Principle

   The Trust Decay Model prevents an agent from permanently banking
   historical performance against future authority claims.  An agent
   that performed excellently six months ago but has not operated in
   the measurement window since has uncertain current trustworthiness.
   Its historical score should decay toward the PT Baseline to reflect
   this uncertainty.

   The decay principle is: trust is maintained by continued
   demonstration, not by historical achievement alone.  A high PT
   Score is evidence that the agent is trustworthy in the context of
   the tasks it has recently performed.  It is not unconditional
   evidence of trustworthiness for tasks it has not recently performed.

   Trust decay is distinct from authority reduction.  Decay reduces
   the PT Score; it does not automatically reduce the agent's mandate
   ceiling.  Authority reduction requires a PT Recommendation and human
   principal approval (Section 7.3).  Decay is the input; the authority
   change is the governed output.

6.2.  Per-Dimension Decay

   Each PT Dimension decays independently.  A dimension that receives
   frequent new signals (many sessions, recent activity) decays slowly.
   A dimension with infrequent signals (few sessions, long gaps) decays
   faster toward the PT Baseline.

   Decay applies from last_signal_at: the timestamp of the most recent
   Event Stream entry that generated a signal for this dimension.

   The decay function MUST satisfy the following normative properties:

   (a) Monotone decay.  In the absence of new signals, a PT Dimension
       score MUST NOT increase.

   (b) Baseline floor.  A PT Dimension score MUST NOT decay below the
       PT Baseline (default: 0.5).  Decay reduces a high score toward
       the baseline; it does not penalize absence.

   (c) Half-life semantics.  Each dimension has a configurable half-
       life parameter H (in days): after H days without a new signal,
       the gap between the current dimension score and the PT Baseline
       MUST be reduced by at least 50%.

   (d) Signal reset.  A new behavioral signal (positive or negative)
       resets the decay clock for that dimension.  last_signal_at is
       updated to the timestamp of the new signal.

   (e) Symmetry.  Decay applies equally to dimensions above and below
       the composite.  A dimension below baseline (if an agent performs
       worse than baseline) decays toward baseline (improving), not
       toward zero.

6.3.  Decay Parameters

   Default decay half-life values by dimension:

   Dimension  Code  Default Half-Life  Rationale
   ----------------------------------------------------------
   SAS        60 days            Self-assessment is a stable property
                                 of an agent's design; decays slowly.
   JS         45 days            Judgment may degrade as new scenario
                                 types are encountered.
   ES         30 days            Effectiveness reflects current
                                 operational conditions.
   PS         30 days            Precision is sensitive to recent
                                 task difficulty.
   AS         45 days            Adaptability reflects current Cedar
                                 policy environment.

   All decay half-life parameters are operator-configurable.  Changes
   to decay parameters MUST be recorded in the GEC's Policy Change Log
   and MUST generate a PT_SCORE_UPDATED Event Stream entry (Section 10)
   for each affected agent to record that the score was recomputed
   under updated parameters.

6.4.  Decay and the Mandate Ceiling

   When trust decay causes a PT Dimension score to cross a configured
   REDUCTION_THRESHOLD, the GEC MUST generate a PT_RECOMMENDATION_
   ISSUED event recommending mandate ceiling review (Section 7.3).

   An agent whose PT Record has decayed significantly due to extended
   inactivity MUST NOT be granted a new mandate with an elevated ceiling
   solely on the basis of its historical PT Record without human
   principal review of the decay state.  The GEC MUST surface the decay
   state to the human principal at mandate issuance time if any PT
   Dimension score is more than 0.2 below its peak value due to decay.


7.  ProgressiveTrustSummary

7.1.  Purpose

   The ProgressiveTrustSummary is delivered to human principals within
   the HEMContext [I-D.sato-soos-hem] at every HEM escalation.  Its
   purpose is to ensure that the human principal's HEM decision is
   informed by the agent's behavioral track record, not made in the
   absence of it.

   The ProgressiveTrustSummary is the PT specification's primary
   human-facing output.  It must be comprehensible by a non-technical
   human principal making a time-sensitive governance decision.

7.2.  Schema

   {
     "agent_id":            string,  ; REQUIRED. Party Registry ID.
     "computed_at":         string,  ; REQUIRED. ISO 8601.
     "session_count":       integer, ; REQUIRED. Total sessions scored.
     "measurement_window_days": integer, ; REQUIRED. Window for scores.

     "dimensions": {
       "ccs": {
         "score":           number,  ; Float 0.0-1.0.
         "trend":           string,  ; IMPROVING|STABLE|DECLINING.
         "session_count":   integer, ; Sessions contributing to this.
         "last_signal_at":  string,  ; ISO 8601.
         "decay_applied":   boolean, ; Whether decay has reduced score.
         "plain_language":  string   ; Human-readable one-sentence summary.
       },
       "ecs": {
         "score":           number,
         "trend":           string,
         "session_count":   integer,
         "last_signal_at":  string,
         "notable_events":  [object], ; Significant HEM outcomes.
         "plain_language":  string
       },
       "gcr": {
         "score":           number,
         "trend":           string,
         "session_count":   integer,
         "goal_achieved_count": integer,
         "other_closure_count": integer,
         "plain_language":  string
       },
       "car": {
         "score":           number,
         "trend":           string,
         "compensating_action_rate": number, ; Float. Raw rate.
         "plain_language":  string
       },
       "drr": {
         "score":           number,
         "trend":           string,
         "deny_count":      integer,
         "successful_recovery_count": integer,
         "plain_language":  string
       }
     },

     "composite": {
       "score":             number,  ; Float 0.0-1.0.
       "confidence":        number,  ; Float 0.0-1.0.
       "low_confidence":    boolean, ; True if session_count < 20.
       "plain_language":    string   ; Overall one-sentence summary.
     },

     "active_recommendations": [object], ; Pending PT_RECOMMENDATIONs.
     "pt_summary_hash":     string   ; SHA-256 of canonical JSON.
   }

   Each notable_events entry in ecs carries: hem_id, trigger_class,
   outcome_decision, occurred_at, and a plain_language description.

7.3.  Delivery at HEM Escalation

   The GEC MUST include a ProgressiveTrustSummary in every HEMContext
   delivered to a human principal at HEM escalation.

   The ProgressiveTrustSummary MUST be computed at the moment of HEM
   invocation, reflecting the PT Record as of that moment.

   The ProgressiveTrustSummary in HEMContext is informational for the
   human principal; it does not constrain the human principal's decision
   choices.  A human principal MAY choose to APPROVE despite a low PT
   Score, or to TERMINATE despite a high PT Score.  The
   ProgressiveTrustSummary informs the decision; it does not override
   the human principal's authority.

   The ProgressiveTrustSummary is part of the permanent audit record.
   It is embedded in the HEM_INVOKED Event Stream entry (via the
   HEMContext schema) and is available to Verified External Auditors
   through the GAR Audit Package [I-D.sato-soos-gar].


8.  PT-Informed Mandate Management

8.1.  Authority Evolution Model

   PT-Informed Mandate Management is the process by which the GEC
   generates structured authority evolution recommendations based on
   PT Score thresholds, which human principals may then approve and
   apply by issuing updated Mandate JWTs.

   The authority evolution model has two directions:

   Elevation: PT Score crosses a configured ELEVATION_THRESHOLD,
   triggering a PT Recommendation proposing increased mandate ceiling
   or Agent Class.  Requires human principal APPROVAL.  Never
   automatic.

   Reduction: PT Score crosses a configured REDUCTION_THRESHOLD,
   triggering a PT Recommendation proposing decreased mandate ceiling
   or Agent Class.  May be automatic at operator discretion (Section
   7.3).

   The asymmetry is deliberate.  Elevation of agent authority is a
   human decision.  Reduction of agent authority when behavioral
   evidence supports it MAY be configured as automatic by operators
   who accept the operational implications.

8.2.  Elevation Recommendations

   The GEC generates a PT_RECOMMENDATION_ISSUED event (Section 10.2)
   recommending mandate ceiling elevation when:

   (a) All five PT Dimension scores meet or exceed their configured
       ELEVATION_THRESHOLD for the current mandate_ceiling level.

   (b) session_count for all dimensions is at least 20.
       (low_confidence flag is false for all dimensions)

   (c) No PT Dimension has a DECLINING trend.

   (d) No STRONGLY NEGATIVE signal has been recorded in any dimension
       in the last 30 days.

   The PT Recommendation for elevation proposes:

   {
     "recommendation_type": "ELEVATION",
     "current_mandate_ceiling": integer, ; 1, 2, or 3.
     "proposed_mandate_ceiling": integer, ; current + 1 (max 3).
     "current_agent_class": string,
     "proposed_agent_class": string | null, ; null if no class change.
     "supporting_evidence": {
       "dimension_scores":    object,    ; All five dimensions.
       "session_count":       integer,
       "trend_summary":       string,
       "threshold_detail":    [object]   ; Per-dimension threshold met.
     },
     "recommendation_rationale": string  ; Plain language.
   }

   Elevation Recommendations MUST be presented to the human principal
   for review.  The human principal MUST explicitly approve before the
   GEC applies any authority change.  The GEC MUST NOT autonomously
   elevate mandate ceilings or Agent Class.

8.3.  Reduction Actions

   The GEC generates a PT_RECOMMENDATION_ISSUED event recommending
   mandate ceiling reduction when:

   (a) Any PT Dimension score falls below its configured
       REDUCTION_THRESHOLD, OR

   (b) Any STRONGLY NEGATIVE signal is recorded (MANDATE_REVOKED
       closure, KERNEL_REJECTED closure, or HEM_TIMEOUT at REQUIRED
       urgency), OR

   (c) Trust decay has reduced the Composite PT Score below
       DECAY_REDUCTION_THRESHOLD.

   The PT Recommendation for reduction proposes:

   {
     "recommendation_type": "REDUCTION",
     "current_mandate_ceiling": integer,
     "proposed_mandate_ceiling": integer, ; current - 1 (min 1).
     "trigger": string,                   ; Which condition triggered.
     "trigger_evidence": object,          ; Supporting Event Stream ref.
     "urgency": string,                   ; ADVISORY|RECOMMENDED|REQUIRED.
     "auto_apply": boolean,               ; Whether operator has
                                          ; configured automatic apply.
     "recommendation_rationale": string
   }

   When urgency is REQUIRED (triggered by STRONGLY NEGATIVE signals),
   the GEC SHOULD surface the Reduction Recommendation to the human
   principal immediately via the same notification channel used for
   HEM.

   Automatic application of Reduction Recommendations:

   Operators MAY configure auto_apply: true for Reduction
   Recommendations at urgency ADVISORY.  At urgency RECOMMENDED or
   REQUIRED, human principal approval is always required before
   application, regardless of operator configuration.

   Auto-applied reductions MUST generate a PT_RECOMMENDATION_APPLIED
   event (Section 10.3) with applying_principal: "GEC_AUTO_APPLY" and
   MUST trigger cascade revocation [I-D.sato-soos-mjwt] Section 7.2
   of any Child Mandates derived from the affected Root Mandate.

8.4.  Human Principal Approval Requirement

   Every Elevation Recommendation MUST be explicitly approved by a
   human principal before the GEC applies it.

   Approval is recorded as a PT_RECOMMENDATION_APPLIED event with
   applying_principal referencing the human principal's Party Registry
   identifier and their Ed25519 signature over the Recommendation.

   A GEC that autonomously applies an Elevation Recommendation without
   human principal approval MUST be treated as a conformance failure.
   This invariant MUST NOT be configurable by operators: human
   principals retain unconditional approval authority over agent
   authority elevation.


9.  Zone B Access and PT Score

   The Mandate JWT [I-D.sato-soos-mjwt] Section 4.2.3 defines
   zone_b_read and zone_b_write as boolean authorization flags.  These
   flags are static at issuance time.  PT introduces PT-conditioned
   Zone B access: Cedar policies that reference PT Score dimensions
   to gate Zone B access dynamically.

   PT-conditioned Zone B access uses the pt_context Cedar attribute:

   pt_context: {
     "ccs_score":   number,  ; Float 0.0-1.0.
     "ecs_score":   number,
     "gcr_score":   number,
     "car_score":   number,
     "drr_score":   number,
     "composite":   number,
     "low_confidence": boolean,
     "session_count": integer
   }

   The GEC MUST make pt_context available as a Cedar attribute during
   policy evaluation for every Transition Request from an agent with
   a PT Record.

   Example Cedar policy using PT context:

   permit(
     principal,
     action == Action::"atp:booking:zone_b_health_read",
     resource
   )
   when {
     context.pt_context.ccs_score >= 0.75 &&
     context.pt_context.ecs_score >= 0.70 &&
     !context.pt_context.low_confidence
   };

   This policy pattern allows Zone B access to expand as an agent
   demonstrates calibrated behavior, without requiring human principal
   issuance of a new Mandate JWT for each access expansion.  The
   Mandate JWT's zone_b_read: true is a prerequisite; the Cedar policy
   is the PT-informed gate within that permission.

   PT-conditioned Zone B access does not expand beyond the scope
   granted in the Mandate JWT.  The Narrowing Property
   [I-D.sato-soos-mjwt] Section 5 is not affected: PT-conditioned
   Cedar policies operate within the Mandate JWT's existing scope;
   they do not grant new scope.


10.  PT Score Storage and Computation

10.1.  Party Registry PT Record

   The GEC MUST maintain a PT Record for each agent identity in the
   Party Registry.  The PT Record is a performance projection: it is
   derived from the Event Stream and MUST be rebuildable from the
   Event Stream on GEC restart (consistent with INV-7 in
   [I-D.sato-soos-sov]).

   PT Record schema:

   {
     "agent_id":            string,  ; Party Registry identifier.
     "computed_at":         string,  ; ISO 8601. Last computation time.
     "ccs":                 object,  ; CCS dimension record.
     "ecs":                 object,  ; ECS dimension record.
     "gcr":                 object,  ; GCR dimension record.
     "car":                 object,  ; CAR dimension record.
     "drr":                 object,  ; DRR dimension record.
     "composite":           object,  ; Composite score and confidence.
     "active_recommendations": [object], ; Pending recommendations.
     "decay_parameters":    object,  ; Current decay config.
     "weighting_model":     object   ; Current composite weights.
   }

   Each dimension record carries: score, trend, session_count,
   last_signal_at, decay_applied, and raw_signal_log (last N signals
   with timestamps, for rebuild verification).

   The PT Record MUST be updated after every AEP_SESSION_CLOSED entry
   that carries behavioral signals for the agent.  The update MUST be
   atomic: the GEC MUST NOT allow PT Score queries to observe a
   partially-updated PT Record.

10.2.  Event Stream as Canonical Source

   The Party Registry PT Record is a cache.  The Event Stream is the
   canonical source.  A GEC that restarts MUST be able to rebuild the
   complete PT Record for any agent from that agent's Event Stream
   entries alone.

   This requirement means the Event Stream must contain all information
   necessary for PT computation, including:

   - IDP confidence values and cedar_result from every StateTransition
     Event (for CCS).
   - HEM_INVOKED and HEM_RESOLVED entries with trigger_class and
     decision fields (for ECS).
   - AEP_SESSION_CLOSED entries with closure_reason and goal_achieved
     (for GCR).
   - COMPENSATING_ACTION_TAKEN entries and total transition counts
     (for CAR).
   - DENY entries and subsequent RETRY_CONTINUATION IDPs (for DRR).

   All of these entry types are already specified in the SOOS protocol
   family.  No new Event Stream entry type is required for PT
   computation source data; the existing entries are sufficient.

10.3.  Analytics Principal and Tier 2 Computation

   PT Score computation is a Tier 2 analytics function
   [I-D.sato-soos-idp] Section 3.5: it operates across AEP Sessions
   within an operator's trust domain.

   Two computation models are defined:

   GEC-Integrated Computation: The GEC computes PT Scores directly
   from its own Event Stream.  The PT Record in the Party Registry is
   updated by the GEC after each relevant session closure.  This model
   is RECOMMENDED for Level 2 and Level 3 GECs where the Event Stream
   and Party Registry are co-located.

   Analytics Principal Computation: An Analytics Principal (a
   registered principal with read-only Event Stream access) queries the
   GEC's Event Stream API, computes PT Scores externally, and submits
   computed scores to the GEC for storage in the PT Record.  The GEC
   MUST verify that the submitted scores are consistent with the Event
   Stream entries they claim to derive from before accepting them.

   In both models, the GEC is the authority for the PT Record.
   An Analytics Principal MUST NOT modify PT Records directly; it
   submits computed scores that the GEC validates and applies.

   Cross-session PT computation requires access to Event Stream entries
   from multiple SO Instances.  The data_residency field in IDP
   [I-D.sato-soos-idp] Section 4.1 controls whether specific Event
   Stream entries are eligible for Tier 2 analytics aggregation.  Tier
   2 PT computation MUST respect data_residency restrictions and MUST
   apply k-anonymity enforcement as specified in [I-D.sato-soos-idp]
   Section 3.5.


11.  PT Event Log Integration

11.1.  PT_SCORE_UPDATED

   Written by the GEC after every PT Record update.

   {
     "event_type":      "PT_SCORE_UPDATED",
     "event_id":        string,  ; UUID v7.
     "prior_event_id":  string,
     "occurred_at":     string,  ; ISO 8601.
     "agent_id":        string,  ; Party Registry identifier.
     "trigger":         string,  ; SESSION_CLOSED | DECAY_APPLIED |
                                 ; PARAMETER_CHANGE | REBUILD.
     "triggering_session_id": string | null,
     "dimension_deltas": {
       "ccs_delta":  number | null,
       "ecs_delta":  number | null,
       "gcr_delta":  number | null,
       "car_delta":  number | null,
       "drr_delta":  number | null,
       "composite_delta": number | null
     },
     "new_composite_score": number,
     "gec_signature":   string   ; Ed25519 GEC signature.
   }

   PT_SCORE_UPDATED entries are written to the agent's Party Registry
   Event Log, not to any specific SO Instance Event Stream.  They are
   accessible to Analytics Principals and Verified External Auditors.

11.2.  PT_RECOMMENDATION_ISSUED

   Written by the GEC when a PT Score threshold crossing triggers an
   authority evolution recommendation.

   {
     "event_type":            "PT_RECOMMENDATION_ISSUED",
     "event_id":              string,   ; UUID v7.
     "prior_event_id":        string,
     "occurred_at":           string,
     "agent_id":              string,
     "recommendation_id":     string,   ; UUID v7. Stable ref for approval.
     "recommendation_type":   string,   ; ELEVATION | REDUCTION.
     "proposed_ceiling":      integer,
     "proposed_agent_class":  string | null,
     "urgency":               string,   ; ADVISORY|RECOMMENDED|REQUIRED.
     "auto_apply":            boolean,
     "triggering_dimension":  string,   ; Which dimension triggered.
     "pt_record_snapshot":    object,   ; Full PT Record at trigger time.
     "gec_signature":         string
   }

11.3.  PT_RECOMMENDATION_APPLIED

   Written by the GEC when a PT Recommendation is applied, whether
   by human principal approval or by GEC auto-apply.

   {
     "event_type":            "PT_RECOMMENDATION_APPLIED",
     "event_id":              string,   ; UUID v7.
     "prior_event_id":        string,
     "occurred_at":           string,
     "agent_id":              string,
     "recommendation_id":     string,   ; References PT_RECOMMENDATION_
                                        ; ISSUED.event_id.
     "applied_ceiling":       integer,
     "applied_agent_class":   string | null,
     "applying_principal":    string,   ; Party Registry ID or
                                        ; "GEC_AUTO_APPLY".
     "principal_signature":   string | null, ; Ed25519 if human principal.
     "affected_mandate_jtis": [string], ; MJWTs requiring reissuance.
     "gec_signature":         string
   }

   When PT_RECOMMENDATION_APPLIED records an Elevation, the affected
   human principal MUST issue new Mandate JWTs with the elevated
   ceiling to the agent.  The GEC does not automatically reissue
   Mandate JWTs on ceiling change.

   When PT_RECOMMENDATION_APPLIED records a Reduction, cascade
   revocation [I-D.sato-soos-mjwt] Section 7.2 MUST be applied
   to all Mandate JWTs with ceilings above the new proposed ceiling.


12.  Relationship to Other SOOS Drafts

   IDP [I-D.sato-soos-idp]:
      The IDP confidence field is the primary input to CCS (Section
      4.2).  The RETRY_CONTINUATION reasoning basis type is the primary
      input to DRR (Section 4.6).  The data_residency field controls
      Tier 2 PT computation eligibility.  The autonomy_level mapping
      in IDP Section 6.5 corresponds to the PT Score's influence on
      effective Cedar policy: an agent with a low CCS SHOULD have Cedar
      policies that treat its VERIFIED confidence declarations as HIGH
      for policy evaluation purposes.

   HEM [I-D.sato-soos-hem]:
      HEM outcomes are the primary input to ECS (Section 4.3).  The
      ProgressiveTrustSummary (Section 6) is embedded in HEMContext
      and delivered to human principals at every HEM escalation.  ECS
      tracks the quality of HEM_AGENT_ESCALATED decisions.  The
      HEM_TIMEOUT at REQUIRED urgency is a STRONGLY NEGATIVE ECS signal.

   GAR [I-D.sato-soos-gar]:
      PT_SCORE_UPDATED, PT_RECOMMENDATION_ISSUED, and PT_
      RECOMMENDATION_APPLIED entries are included in the GAR Audit
      Package when an agent is subject to external audit.  The GAR
      Verified External Auditor role may access PT Records for agents
      within the operator's domain.

   MJWT [I-D.sato-soos-mjwt]:
      The mandate_ceiling claim in the MJWT is the parameter that PT
      Recommendations propose to change.  PT does not modify mandate
      ceilings directly; it generates recommendations that result in
      new MJWT issuance by human principals.  The Narrowing Property
      is preserved: PT-conditioned Zone B access (Section 8) operates
      within the existing Mandate JWT scope.

   AEP [I-D.sato-soos-aep]:
      The AEP defines what the agent does within a session; PT measures
      what the agent has done across sessions.  The AEP_SESSION_CLOSED
      entry is PT's primary session-level input.  The Agent Class model
      in AEP Section 13 is the authority structure PT Recommendations
      propose to evolve.  AEP CONF-AEP-07 (RETRY_CONTINUATION
      requirement) is the behavior PT DRR dimension measures.

   SOV [I-D.sato-soos-sov]:
      The Event Stream's non-suppressibility (INV-ZA-1 and the append-
      only constraint) is the foundation of PT's evidence quality.
      PT computation MUST use only GEC-signed Event Stream entries;
      unsigned or externally-provided behavioral claims are not valid
      PT inputs.

   FAIP [I-D.sato-soos-faip]:
      PT is a Tier 2 (within-operator) specification.  Tier 3 cross-
      operator PT aggregation -- federated agent trust reputation --
      is the primary scope of the Federated Agent Intelligence Protocol.
      The data_residency.tier3_eligible field in IDP controls whether
      an agent's PT signals may flow into FAIP computation.


13.  Security Considerations

   PT Score manipulation.  Because PT Scores are derived exclusively
   from GEC-signed Event Stream entries, an agent cannot directly
   manipulate its PT Score.  The attack surface is the agent's ability
   to influence the Event Stream entries that feed PT -- for example,
   by declaring artificially low confidence on transitions it knows
   will be denied (to avoid CCS penalties) or by artificially escalating
   to HEM on trivial decisions (to accumulate ECS signals with minimal
   risk).

   The first attack is mitigated by the CCS dimension design: low
   confidence on DENY is NEUTRAL, not POSITIVE.  There is no benefit
   to gaming low confidence declarations.

   The second attack (HEM gaming) is mitigated by the ECS trivial-
   case penalty: HEM escalations resolved by the human principal in
   under T_trivial seconds accrue a MILDLY NEGATIVE ECS signal.  An
   agent that floods HEM with trivial escalations degrades its own ECS.

   PT Score over-reliance.  PT Scores are behavioral evidence, not
   behavioral guarantees.  An agent with a high PT Score operating in
   a new context (new SO Type, new Cedar policy set, new domain)
   may perform poorly.  PT Scores MUST be domain-contextualized:
   implementations SHOULD maintain separate PT Records per SO Type
   for agents that operate across multiple SO Types with different
   behavioral requirements.

   Analytics Principal compromise.  In the Analytics Principal
   Computation model (Section 9.3), a compromised Analytics Principal
   could submit falsified PT Scores.  The GEC's validation requirement
   -- that submitted scores must be consistent with the Event Stream --
   provides defense.  However, this validation is computationally
   expensive for large Event Streams.  Implementations using the
   Analytics Principal model MUST sign computed PT Records with the
   Analytics Principal's Ed25519 key and MUST log all submissions in
   the GEC's Policy Change Log.

   Decay parameter manipulation.  Changes to decay parameters affect
   all agents' PT Records.  An operator with access to decay parameters
   could artificially inflate trust scores by setting very slow decay.
   Implementations MUST record all decay parameter changes in the
   Policy Change Log and MUST generate PT_SCORE_UPDATED entries with
   trigger: PARAMETER_CHANGE for all affected agents when parameters
   change.

   Authority inflation via PT Recommendations.  The requirement for
   human principal approval of all Elevation Recommendations (Section
   7.4) is the primary defense against PT-enabled authority inflation.
   Implementations MUST enforce this requirement unconditionally; it
   MUST NOT be operator-configurable.


14.  Privacy Considerations

   PT Records contain behavioral profiles of AI agents.  Where an agent
   is associated with an identifiable natural person (for example, a
   personal AI assistant agent whose agent_id maps to a specific user),
   the PT Record may constitute personal data under GDPR Article 4(1)
   [GDPR] and APPI Article 2 [APPI].

   Access to PT Records MUST be restricted by Cedar policy.  PT Records
   MUST NOT be accessible to other agents or to unauthorized principals.

   The ProgressiveTrustSummary delivered in HEMContext is visible to
   the human principal who resolves the escalation.  This visibility
   is appropriate: the human principal needs behavioral context to make
   a governance decision.  However, implementations MUST NOT expose
   the ProgressiveTrustSummary to principals who are not involved in
   the specific HEM resolution.

   Cross-session PT computation (Tier 2) requires correlating Event
   Stream entries across AEP Sessions.  This correlation may reveal
   patterns about an agent's operational schedule, task scope, and
   human principal activity.  Implementations MUST apply data_residency
   constraints [I-D.sato-soos-idp] Section 4.2 to PT computation and
   MUST NOT include individual session identifiers in Tier 3 aggregations
   without explicit data_residency.tier3_eligible authorization.

   PT_SCORE_UPDATED entries are stored in the Party Registry Event Log.
   This log may have different retention rules than the SO Instance
   Event Stream.  Implementations MUST document PT Record and Party
   Registry Event Log retention periods and MUST apply Cryptographic
   Erasure [I-D.sato-soos-sov] Section 6.3 to any personal data
   associated with PT Records when an erasure request is received.


15.  IANA Considerations

15.1.  PT Event Type Registry

   Registry name: SOOS Progressive Trust Event Type Registry
   Registration procedure: Specification Required.

   Initial registrations:

   Event Type                  Description
   PT_SCORE_UPDATED            PT Record updated after session or decay.
   PT_RECOMMENDATION_ISSUED    Authority evolution recommendation issued.
   PT_RECOMMENDATION_APPLIED   Recommendation applied by principal or GEC.

15.2.  PT Dimension Registry

   Registry name: SOOS Progressive Trust Dimension Registry
   Registration procedure: Standards Action.

   Initial registrations:

   Dimension Code  Name                    Plain Question           Section
   SAS             Self-Assessment Score   Does it know what it     4.2
                                           does not know?
   JS              Judgment Score          Does it ask for help     4.3
                                           at the right moments?
   ES              Effectiveness Score     Does it finish what      4.4
                                           it starts?
   PS              Precision Score         Does it avoid reversing  4.5
                                           its own decisions?
   AS              Adaptability Score      When told no, does it    4.6
                                           adapt?

15.3.  PT Recommendation Type Registry

   Registry name: SOOS Progressive Trust Recommendation Type Registry
   Registration procedure: Specification Required.

   Initial registrations:

   Recommendation Type  Description
   ELEVATION            Propose increased mandate ceiling or Agent Class.
   REDUCTION            Propose decreased mandate ceiling or Agent Class.


16.  References

16.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC7519]  Jones, M., Bradley, J., and N. Sakimura, "JSON Web
              Token (JWT)", RFC 7519, May 2015.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, May 2017.

   [RFC9562]  Davis, B., Peabody, C., and P. Leach, "Universally
              Unique IDentifiers (UUIDs)", RFC 9562, May 2024.

   [Cedar]    Amazon Web Services, "Cedar Policy Language
              Specification", https://docs.cedarpolicy.com/

   [I-D.sato-soos-idp]
              Sato, T., "The Intent Declaration Primitive (IDP) for
              Agentic AI Systems", draft-sato-soos-idp-03, May 2026.

   [I-D.sato-soos-hem]
              Sato, T., "The Human Escalation Mechanism (HEM) for
              Agentic AI Systems", draft-sato-soos-hem-01, May 2026.

   [I-D.sato-soos-gar]
              Sato, T., "Governance Audit Record (GAR) for Agentic
              AI Systems", draft-sato-soos-gar-01, May 2026.

   [I-D.sato-soos-cap]
              Sato, T., "Constitutional AI Protocol (CAP) for Agentic
              AI Systems", draft-sato-soos-cap-00, May 2026.

   [I-D.sato-soos-sov]
              Sato, T., "The Sovereign Object (SOV) for Agentic AI
              Systems", draft-sato-soos-sov-00, May 2026.

   [I-D.sato-soos-mjwt]
              Sato, T., "The Mandate JWT (MJWT) for Agentic AI
              Systems", draft-sato-soos-mjwt-00, May 2026.

   [I-D.sato-soos-aep]
              Sato, T., "The Agent Execution Protocol (AEP) for
              Agentic AI Systems", draft-sato-soos-aep-00, May 2026.

   [GDPR]     European Parliament, "General Data Protection
              Regulation", Regulation (EU) 2016/679, April 2016.

   [APPI]     Government of Japan, "Act on the Protection of Personal
              Information", Act No. 57 of 2003, as amended.

16.2.  Informative References

   [I-D.sato-soos-faip]
              Sato, T., "Federated Agent Intelligence Protocol
              (FAIP)", draft-sato-soos-faip-00, forthcoming.

   [I-D.sato-soos-mad]
              Sato, T., "Multi-Agent Delegation (MAD) for Agentic
              AI Systems", draft-sato-soos-mad-00, forthcoming.

   [I-D.ietf-wimse-arch]
              Salomoni, D., et al., "WIMSE Architecture",
              draft-ietf-wimse-arch, work in progress.

   [I-D.ietf-scitt-architecture]
              Birkholz, H., et al., "An Architecture for Trustworthy
              and Transparent Digital Supply Chains",
              draft-ietf-scitt-architecture, work in progress.

   [EUAIA]    European Parliament, "Artificial Intelligence Act",
              Regulation (EU) 2024/1689, June 2024.


Appendix A.  Azusa Journey -- Progressive Trust Walk-Through

   This appendix illustrates the PT Score evolution for the OTA booking
   agent operating on the Azusa Journey ATP Booking Object over a series
   of AEP Sessions.  Values are illustrative.

A.1.  Baseline (New Agent, No Sessions)

   All dimensions at PT Baseline (0.5).  Composite: 0.5.
   low_confidence: true (session_count = 0).
   No PT Recommendations active.

   Human principal issues Root Mandate with mandate_ceiling: 2,
   agent_class: CLASS_2.  Conservative issuance appropriate for
   zero-history agent.

A.2.  After 10 Sessions

   CCS: 0.71 (agent is declaring 0.85 confidence and achieving PERMIT
   at 80% rate -- slight overconfidence, calibrating).
   ECS: 0.75 (one HEM escalation, resolved APPROVE -- positive signal).
   GCR: 0.80 (8 of 10 sessions GOAL_ACHIEVED).
   CAR: 0.90 (one compensating action in 47 transitions).
   DRR: 0.85 (3 DENYs received, all followed by successful
              RETRY_CONTINUATION).
   Composite: 0.79.  low_confidence: true (session_count < 20).

   No PT Recommendations issued (low_confidence prevents elevation
   threshold evaluation).

A.3.  After 30 Sessions

   CCS: 0.82 (confidence calibration improving; agent adjusting
   declarations toward actual outcomes).
   ECS: 0.88 (two additional appropriate escalations; zero trivial
   cases; one TERMINATE that the human principal retrospectively
   confirmed was correct).
   GCR: 0.87 (26 of 30 sessions GOAL_ACHIEVED).
   CAR: 0.93 (low compensating action rate maintained).
   DRR: 0.91 (consistent RETRY_CONTINUATION on all DENYs).
   Composite: 0.87.  low_confidence: false (all dimensions > 20
   sessions).

   GEC generates PT_RECOMMENDATION_ISSUED: ELEVATION.
   Proposed: mandate_ceiling from 2 to 3, agent_class remains CLASS_2.
   Urgency: ADVISORY.

   Human principal reviews ProgressiveTrustSummary.  Notes ECS
   STRONGLY POSITIVE signal from TERMINATE outcome.  APPROVES elevation.

   PT_RECOMMENDATION_APPLIED recorded.  Human principal issues new
   Root Mandate with mandate_ceiling: 3.

A.4.  After 45-Day Inactivity Gap

   Decay applied to all dimensions from last_signal_at.
   Default half-lives: CCS 60d, ECS 45d, GCR 30d, CAR 30d, DRR 45d.

   At 45 days:
   CCS: 0.74 (0.82 * decay -- CCS half-life 60d, moderate decay).
   ECS: 0.69 (0.88 * decay -- ECS half-life 45d, reached half-life).
   GCR: 0.685 (0.87 * decay -- GCR half-life 30d, past half-life).
   CAR: 0.715 (0.93 * decay -- CAR half-life 30d, past half-life).
   DRR: 0.705 (0.91 * decay -- DRR half-life 45d, at half-life).
   Composite: 0.71.

   PT_SCORE_UPDATED written with trigger: DECAY_APPLIED for each
   dimension.

   No REDUCTION_THRESHOLD crossed (all dimensions above 0.5 baseline).
   No Reduction Recommendation issued.  Mandate ceiling retained at 3.

   When agent resumes operation, first session resets decay clocks
   for all dimensions receiving signals.

A.5.  PT in HEM Context

   In Session 31, the agent encounters an unfamiliar disruption
   scenario and correctly escalates with hem_urgency: REQUIRED.

   The human principal receives HEMContext containing the
   ProgressiveTrustSummary.  plain_language fields read:

   CCS: "Agent confidence is well-calibrated: 82% of high-confidence
         transitions have been permitted (30 sessions)."
   ECS: "Agent escalation judgment is strong: 3 escalations, all
         resolved appropriately including one TERMINATE."
   GCR: "87% of sessions reached declared goal state (30 sessions)."
   CAR: "Agent rarely requires compensating actions (< 2% of
         transitions)."
   DRR: "Agent consistently acknowledges and adapts to denied
         transitions."
   Composite: "This agent has a strong behavioral track record across
               30 sessions."

   The human principal makes an informed APPROVE decision.


Author's Address

   Tom Sato
   MyAuberge K.K.
   Chino, Nagano, Japan
   Email: tomsato@myauberge.jp
   URI:   https://activitytravel.pro/