In a resiliency case, a PCC has redundant PCEP sessions towards multiple PCEs. In such a case, a PCC gives control on an LSP to a single PCE only, and only this PCE is responsible for the path computation for the delegated LSP: the PCC achieves this by setting the D flag only towards the active PCE [RFC8231] selected for delegation.
The election of the active PCE to delegate an LSP is controlled by each PCC. The PCC usually elects the active PCE by a local configured policy (by setting a priority). Upon PCEP session failure, or active PCE failure, PCC may decide to elect a new active PCE by sending new PCRpt message with D flag set to this new active PCE.
When the failed PCE or PCEP session comes back online, it will be up to the implementation to do preemption. Doing preemption may lead to some disruption on the existing path if path results from both PCEs are not exactly the same.
By considering a network with multiple PCCs and implementing multiple stateful PCEs for redundancy purpose, there is no guarantee that at any time all the PCCs delegate their LSPs to the same PCE.¶
+----------+
| PCC1 | LSP1
+----------+
/ \
/ \
+---------+ +---------+
| PCE1 | | PCE2 |
+---------+ +---------+
\ /
*fail* \ /
+----------+
| PCC2 | LSP2
+----------+
¶
In the example above, we consider that by configuration, both PCCs will firstly delegate their LSPs to PCE1.
So, PCE1 is responsible for computing a path for both LSP1 and LSP2.
If the PCEP session between PCC2 and PCE1 fails, PCC2 will delegate LSP2 to PCE2. So PCE1 becomes responsible only for LSP1 path computation while PCE2 is responsible for the path computation of LSP2.
When the PCC2-PCE1 session is back online, PCC2 will keep using PCE2 as active PCE (consider no preemption in this example).
So the result is a permanent situation where each PCE is responsible for a subset of path computation.¶
This situation is called a split-brain scenario, as there are multiple computation brains running at the same time while a central computation unit was required in some deployments/use cases.¶
Further, there are use cases where a particular LSP path computation is linked to another LSP path computation: the most common use case is path disjointness (see [RFC8800]). The set of LSPs that are dependent to each other may start from a different head-end.¶
_________________________________________
/ \
/ +------+ +------+ \
| | PCE1 | | PCE2 | |
| +------+ +------+ |
| |
| +------+ +------+ |
| | PCC1 | ----------------------> | PCC2 | |
| +------+ +------+ |
| |
| |
| +------+ +------+ |
| | PCC3 | ----------------------> | PCC4 | |
| +------+ +------+ |
| |
\ /
\_________________________________________/
_________________________________________
/ \
/ +------+ +------+ \
| | PCE1 | | PCE2 | |
| +------+ +------+ |
| |
| +------+ 10 +------+ |
| | PCC1 | ----- R1 ---- R2 ------- | PCC2 | |
| +------+ | | +------+ |
| | | |
| | | |
| +------+ | | +------+ |
| | PCC3 | ----- R3 ---- R4 ------- | PCC4 | |
| +------+ +------+ |
| |
\ /
\_________________________________________/
¶
In the figure above, the requirement is to create two link-disjoint LSPs: PCC1->PCC2 and PCC3->PCC4. In the topology, all links cost metric is set to 1 except for the link 'R1-R2' which has a metric of 10.
The PCEs are responsible for the path computation and PCE1 is the active primary PCE for all PCCs in the nominal case.¶
Scenario 1:¶
In the normal case (PCE1 as active primary PCE), consider that PCC1->PCC2 LSP is configured first with the link disjointness constraint, PCE1 sends a PCUpd message to PCC1 with the ERO: R1->R3->R4->R2->PCC2 (shortest path).
PCC1 signals and installs the path. When PCC3->PCC4 is configured, the PCEs already knows the path of PCC1->PCC2 and can compute a link-disjoint path: the solution requires to move PCC1->PCC2 onto a new path to let room for the new LSP.
PCE1 sends a PCUpd message to PCC1 with the new ERO: R1->R2->PCC2 and a PCUpd to PCC3 with the following ERO: R3->R4->PCC4.
In the normal case, there is no issue for PCE1 to compute a link-disjoint path.¶
Scenario 2:¶
Consider that PCC1 lost its PCEP session with PCE1 (all other PCEP sessions are UP). PCC1 delegates its LSP to PCE2.¶
+----------+
| PCC1 | LSP: PCC1->PCC2
+----------+
\
\ D=1
+---------+ +---------+
| PCE1 | | PCE2 |
+---------+ +---------+
D=1 \ / D=0
\ /
+----------+
| PCC3 | LSP: PCC3->PCC4
+----------+
¶
Consider that the PCC1->PCC2 LSP is configured first with the link disjointness constraint, PCE2 (which is the new active primary PCE for PCC1) sends a PCUpd message to PCC1 with the ERO: R1->R3->R4->R2->PCC2 (shortest path).
When PCC3->PCC4 is configured, PCE1 is not aware of LSPs from PCC1 any more, so it cannot compute a disjoint path for PCC3->PCC4 and will send a PCUpd message to PCC3 with the shortest path ERO: R3->R4->PCC4.
When PCC3->PCC4 LSP will be reported to PCE2 by PCC3, PCE2 will ensure disjointness computation and will correctly move PCC1->PCC2 (as it owns delegation for this LSP) on the following path: R1->R2->PCC2.
With this sequence of event and these PCEP sessions, disjointness is ensured.¶
Scenario 3:¶
+----------+
| PCC1 | LSP: PCC1->PCC2
+----------+
/ \
D=1 / \ D=0
+---------+ +---------+
| PCE1 | | PCE2 |
+---------+ +---------+
/ D=1
/
+----------+
| PCC3 | LSP: PCC3->PCC4
+----------+
¶
Consider the above PCEP sessions and the PCC1->PCC2 LSP is configured first with the link disjointness constraint, PCE1 computes the shortest path as it is the only LSP in the disjoint association group that it is aware of: R1->R3->R4->R2->PCC2 (shortest path).
When PCC3->PCC4 is configured, PCE2 must compute a disjoint path for this LSP. The only solution found is to move PCC1->PCC2 LSP on another path, but PCE2 cannot do it as it does not have delegation for this LSP.
In this set-up, PCEs are not able to find a disjoint path.¶
Scenario 4:¶
+----------+
| PCC1 | LSP: PCC1->PCC2
+----------+
/ \
D=1 / \ D=0
+---------+ +---------+
| PCE1 | | PCE2 |
+---------+ +---------+
D=0 \ / D=1
\ /
+----------+
| PCC3 | LSP: PCC3->PCC4
+----------+
¶
Consider the above PCEP sessions and that PCEs are configured to fall-back to the shortest path if disjointness cannot be found as described in [RFC8800].
The PCC1->PCC2 LSP is configured first, PCE1 computes the shortest path as it is the only LSP in the disjoint association group that it is aware of: R1->R3->R4->R2->PCC2 (shortest path).
When PCC3->PCC4 is configured, PCE2 must compute a disjoint path for this LSP. The only solution found is to move PCC1->PCC2 LSP on another path, but PCE2 cannot do it as it does not have delegation for this LSP.
PCE2 then provides the shortest path for PCC3->PCC4: R3->R4->PCC4. When PCC3 receives the ERO, it reports it back to both PCEs.
When PCE1 becomes aware of the PCC3->PCC4 path, it recomputes the constrained shortest path first (CSPF) algorithm and provides a new path for PCC1->PCC2: R1->R2->PCC2. The new path is reported back to all PCEs by PCC1.
PCE2 recomputes also CSPF to take into account the new reported path. The new computation does not lead to any path update.¶
Scenario 5:¶
_____________________________________
/ \
/ +------+ +------+ \
| | PCE1 | | PCE2 | |
| +------+ +------+ |
| |
| +------+ 100 +------+ |
| | | -------------------- | | |
| | PCC1 | ----- R1 ----------- | PCC2 | |
| +------+ | +------+ |
| | | | |
| 6 | | 2 | 2 |
| | | | |
| +------+ | +------+ |
| | PCC3 | ----- R3 ----------- | PCC4 | |
| +------+ 10 +------+ |
| |
\ /
\_____________________________________/
¶
Now, consider a new network topology with the same PCEP sessions as the previous example.
Suppose that both LSPs are configured almost at the same time.
PCE1 will compute a path for PCC1->PCC2 while PCE2 will compute a path for PCC3->PCC4. As each PCE is not aware of the path of the second LSP in the association group (not reported yet), each PCE is computing the shortest path for the LSP.
PCE1 computes ERO: R1->PCC2 for PCC1->PCC2 and PCE2 computes ERO: R3->R1->PCC2->PCC4 for PCC3->PCC4.
When these shortest paths will be reported to each PCE. Each PCE will recompute disjointness.
PCE1 will provide a new path for PCC1->PCC2 with ERO: PCC1->PCC2. PCE2 will provide also a new path for PCC3->PCC4 with ERO: R3->PCC4. When those new paths will be reported to both PCEs, this will trigger CSPF again.
PCE1 will provide a new more optimal path for PCC1->PCC2 with ERO: R1->PCC2 and PCE2 will also provide a more optimal path for PCC3->PCC4 with ERO: R3->R1->PCC2->PCC4.
So we come back to the initial state. When those paths will be reported to both PCEs, this will trigger CSPF again. An infinite loop of CSPF computation is then happening with a permanent flap of paths because of the split-brain situation.¶
This permanent computation loop comes from the inconsistency between the state of the LSPs as seen by each PCE due to the split-brain: each PCE is trying to modify at the same time its delegated path based on the last received path information which de facto invalidates this received path information.¶
Scenario 6: multi-domain¶
Domain/Area 1 Domain/Area 2
________________ ________________
/ \ / \
/ +------+ | | +------+ \
| | PCE1 | | | | PCE3 | |
| +------+ | | +------+ |
| | | |
| +------+ | | +------+ |
| | PCE2 | | | | PCE4 | |
| +------+ | | +------+ |
| | | |
| +------+ | | +------+ |
| | PCC1 | | | | PCC2 | |
| +------+ | | +------+ |
| | | |
| | | |
| +------+ | | +------+ |
| | PCC3 | | | | PCC4 | |
| +------+ | | +------+ |
\ | | |
\_______________/ \________________/
¶
In the example above, suppose that the disjoint LSPs from PCC1 to PCC2 and from PCC4 to PCC3 are created. All the PCEs have the knowledge of both domain topologies (e.g. using BGP-LS [RFC7752]). For operation/management reasons, each domain uses its own group of redundant PCEs.
PCE1/PCE2 in domain 1 have PCEP sessions with PCC1 and PCC3 while PCE3/PCE4 in domain 2 have PCEP sessions with PCC2 and PCC4. As PCE1/2 does not know about LSPs from PCC2/4 and PCE3/4 do not know about LSPs from PCC1/3, there is no possibility to compute the disjointness constraint.
This scenario can also be seen as a split-brain scenario. This multi-domain architecture (with multiple groups of PCEs) can also be used in a single domain, where an operator wants to limit the failure domain by creating multiple groups of PCEs maintaining a subset of PCCs.
As for the multi-domain example, there will be no possibility to compute the disjoint path starting from head-ends managed by different PCE groups.¶
In this document, we propose a solution that addresses the possibility to compute LSP association based constraints (like disjointness) in split-brain scenarios while preventing computation loops.¶