Skip to content

Conversation

@edipascale
Copy link
Contributor

add l3vni to external VRFs, advertise routes we learn from the external (filtered by BGP community) over EVPN. This way the gateway will see them just as another VPC and we will be able to do external peering via GW with no special code

@github-actions
Copy link

github-actions bot commented Aug 7, 2025

🚀 Temp artifacts published: v0-be377787f 🚀

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from be37778 to 2d64064 Compare August 7, 2025 14:49
@github-actions
Copy link

github-actions bot commented Aug 7, 2025

🚀 Temp artifacts published: v0-2d640644c 🚀

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from 2d64064 to dcf672e Compare August 8, 2025 07:46
@github-actions
Copy link

github-actions bot commented Aug 8, 2025

🚀 Temp artifacts published: v0-dcf672e97 🚀

@edipascale
Copy link
Contributor Author

@Frostman I ran part of the release-test (specifically all tests with externals) in env-4 with these changes and everything worked fine. I could also manually check that we are advertising the default route via the external in the EVPN AF, and that no routes from the VPCs are advertised via the external. Obviously the real test would be running this with the gateway and checking whether external peering works + whether we can mix and match fabric and GW external peering without issues.

HOWEVER, the only way to upgrade the agents was to delete all VPCs and externals, else it would complain of missing VNIs and/or gNMI errors when attempting to modifying VNIs due to collisions. Also the VNIs of the externals change depending on the number of VPCs we have in the system at any one time, which is not great (although I did not observe errors like the gNMI one while testing, once the upgrade was completed). I think we should partition the VNI space or at least have the externals come before the VPCs, given that they are less likely to change.

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from dcf672e to cebcb12 Compare August 25, 2025 07:55
@github-actions
Copy link

🚀 Temp artifacts published: v0-cebcb123e 🚀

@edipascale
Copy link
Contributor Author

edipascale commented Aug 29, 2025

I got back to this today and decided to try it in a VM (with the help of the virt-external branch) so that I could try the end-to-end flow with the gateway, but at some point I realized that the external routes were not being advertised to the spines from the attached leaf. The only indication that something was wrong, other than not seeing the routes, was the output of this command, on the leaf attached to the external, whose VRF has VNI 700:

leaf-03# show evpn vni 700
VNI: 700
  Type: L3
  Tenant VRF: VrfEexternal-01
  Local Vtep Ip: 0.0.0.0
  Local External Vtep Ip: 0.0.0.0
  Vxlan-Intf: None
  SVI-If: None
  State: Down
  Client State: Unknown
  VNI Filter: none
  System MAC: None
  Router MAC: None
  L2 VNIs:

compare it to a VNI of a VPC VRF:

leaf-03# show evpn vni 600
VNI: 600
  Type: L3
  Tenant VRF: VrfVvpc-06
  Local Vtep Ip: 172.30.12.2
  Local External Vtep Ip: 0.0.0.0
  Vxlan-Intf: vtepfabric-3001
  SVI-If: Vlan3001
  State: Up
  Client State: Up
  VNI Filter: none
  System MAC: 0c:20:12:ff:02:09
  Router MAC: 0c:20:12:ff:02:09
  L2 VNIs: 601

I'm not sure if something got broken since I first tested this, or it has to do with virtual switches (as opposed to the hybrid envs where I had tried it before). I'm going to double check the output of this command on a hybrid env to make sure I'm not yet again wasting time due to VS limitations.

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from cebcb12 to 83be71c Compare August 29, 2025 13:43
@github-actions
Copy link

🚀 Temp artifacts published: v0-83be71c53 🚀

@edipascale
Copy link
Contributor Author

I'm going to double check the output of this command on a hybrid env to make sure I'm not yet again wasting time due to VS limitations.

no, env-1 shows the same behavior. Either something broke it along the way, or I had hallucinations when I thought it worked in env-4?

@edipascale
Copy link
Contributor Author

OK, problem solved - I was missing the dedicated IRB VLAN. Manually adding it solved the issue:

leaf-03# show running-configuration interface Vlan 3100
!
interface Vlan3100
 description "External-01 IRB"
 neigh-suppress
 ip vrf forwarding VrfEexternal-01
leaf-03# show running-configuration interface vxlan
!
interface vxlan vtepfabric
 source-ip Loopback2
 qos-mode pipe dscp 0
 map vni 500 vlan 3000
 map vni 501 vlan 1005
 map vni 600 vlan 3001
 map vni 601 vlan 1006
 map vni 700 vlan 3100
 map vni 700 vrf VrfEexternal-01
 map vni 500 vrf VrfVvpc-05
 map vni 600 vrf VrfVvpc-06
leaf-03# show evpn vni 700
VNI: 700
  Type: L3
  Tenant VRF: VrfEexternal-01
  Local Vtep Ip: 172.30.12.2
  Local External Vtep Ip: 0.0.0.0
  Vxlan-Intf: vtepfabric-3100
  SVI-If: Vlan3100
  State: Up
  Client State: Up
  VNI Filter: none
  System MAC: 0c:20:12:ff:02:09
  Router MAC: 0c:20:12:ff:02:09
  L2 VNIs:

and now on the spine:

spine-01# show bgp l2vpn evpn route type prefix ip 0.0.0.0/0
BGP table version is 2, local router ID is 172.30.8.0
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, q queued, r RIB-failure
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
   Network          Next Hop            Metric LocPrf Weight Path
                    Extended Community
Route Distinguisher: 172.30.8.4:5096
*    [5]:[0]:[0]:[0.0.0.0]
                    172.30.12.2                                   0 65101 65100 65103 64102 ?
                    RT:65103:700 ET:8 Rmac:0c:20:12:ff:02:09
*    [5]:[0]:[0]:[0.0.0.0]
                    172.30.12.2                                   0 65534 65100 65103 64102 ?
                    RT:65103:700 Rmac:0c:20:12:ff:02:09
*    [5]:[0]:[0]:[0.0.0.0]
                    172.30.12.2                                   0 65102 65100 65103 64102 ?
                    RT:65103:700 ET:8 Rmac:0c:20:12:ff:02:09
*>   [5]:[0]:[0]:[0.0.0.0]
                    172.30.12.2              0                     0 65103 64102 ?
                    RT:65103:700 ET:8 Rmac:0c:20:12:ff:02:09
Displayed 1 prefixes (4 paths) (of requested type)

I must update the branch to also add this

@edipascale
Copy link
Contributor Author

after making the changes above and manually creating a gateway peering between vpc-3 and the "external-01" VPC (which the gateway sees as a standard VPC), I can see that pings from the peered VPC get to the virtual external, but we are not able to send responses back, because the external does not have a route for that subnet to send it back

9:11:22.219598 enp2s1 In  ifindex 3 0c:20:12:ff:02:09 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 61, id 9369, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.3.2 > 1.0.0.1: ICMP echo request, id 24, seq 1, length 64
09:11:22.219598 enp2s1.10 In  ifindex 5 0c:20:12:ff:02:09 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 61, id 9369, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.3.2 > 1.0.0.1: ICMP echo request, id 24, seq 1, length 64
09:11:22.219817 enp2s0 Out ifindex 2 0c:20:12:fe:01:00 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 60, id 9369, offset 0, flags [DF], proto ICMP (1), length 84)
    172.31.1.10 > 1.0.0.1: ICMP echo request, id 24, seq 1, length 64
09:11:22.229279 enp2s0 In  ifindex 2 52:55:ac:1f:01:02 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 255, id 4936, offset 0, flags [DF], proto ICMP (1), length 84)
    1.0.0.1 > 172.31.1.10: ICMP echo reply, id 24, seq 1, length 64
09:11:22.229306 enp2s0 Out ifindex 2 0c:20:12:fe:01:00 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 254, id 4936, offset 0, flags [DF], proto ICMP (1), length 84)
    1.0.0.1 > 10.0.3.2: ICMP echo reply, id 24, seq 1, length 64

adding a static route for the IPv4 namespace to the external attachment interface enp2s1.10 (which is probably not what we want long term anyway for various reasons) did not fix it:

core@virt-external ~ $ sudo tcpdump -nevi any icmp
tcpdump: WARNING: any: That device doesn't support promiscuous mode
(Promiscuous mode not supported on the "any" device)
dropped privs to pcap
tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
09:50:18.176983 enp2s1 In  ifindex 3 0c:20:12:ff:02:09 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 61, id 59885, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.3.2 > 1.0.0.1: ICMP echo request, id 29, seq 1, length 64
09:50:18.176983 enp2s1.10 In  ifindex 5 0c:20:12:ff:02:09 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 61, id 59885, offset 0, flags [DF], proto ICMP (1), length 84)
    10.0.3.2 > 1.0.0.1: ICMP echo request, id 29, seq 1, length 64
09:50:18.177120 enp2s0 Out ifindex 2 0c:20:12:fe:01:00 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 60, id 59885, offset 0, flags [DF], proto ICMP (1), length 84)
    172.31.1.10 > 1.0.0.1: ICMP echo request, id 29, seq 1, length 64
09:50:18.186517 enp2s0 In  ifindex 2 52:55:ac:1f:01:02 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 255, id 6188, offset 0, flags [DF], proto ICMP (1), length 84)
    1.0.0.1 > 172.31.1.10: ICMP echo reply, id 29, seq 1, length 64
09:50:21.285784 enp2s0 Out ifindex 2 0c:20:12:fe:01:00 ethertype IPv4 (0x0800), length 132: (tos 0xc0, ttl 64, id 49535, offset 0, flags [none], proto ICMP (1), length 112)
    172.31.1.10 > 1.0.0.1: ICMP host 172.31.1.10 unreachable, length 92
        (tos 0x0, ttl 254, id 6188, offset 0, flags [DF], proto ICMP (1), length 84)
    1.0.0.1 > 172.31.1.10: ICMP echo reply, id 29, seq 1, length 64
^C
5 packets captured
6 packets received by filter
0 packets dropped by kernel
core@virt-external ~ $ ip r
default via 172.31.1.2 dev enp2s0 proto dhcp src 172.31.1.10 metric 1024
10.0.0.0/16 nhid 9 dev enp2s1.10 proto 196 metric 20
22.22.22.22 nhid 20 dev external-01 proto bgp metric 20
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.31.1.0/24 dev enp2s0 proto kernel scope link src 172.31.1.10 metric 1024
172.31.1.2 dev enp2s0 proto dhcp scope link src 172.31.1.10 metric 1024
172.31.1.3 dev enp2s0 proto dhcp scope link src 172.31.1.10 metric 1024
core@virt-external ~ $ ip r show vrf external-01
default nhid 17 via 172.31.1.2 dev enp2s0 proto 196 metric 20 onlink
10.0.0.0/16 nhid 23 dev enp2s1.10 proto bgp metric 20
100.1.10.0/24 dev enp2s1.10 proto kernel scope link src 100.1.10.6

Looks like this is interfering with the local NAT? how are we supposed to handle NAT anyway when there's both the gateway and the external potentially doing it?

@github-actions
Copy link

github-actions bot commented Sep 1, 2025

🚀 Temp artifacts published: v0-bd3dd2168 🚀

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from bd3dd21 to 488e5b1 Compare September 12, 2025 12:33
@github-actions
Copy link

🚀 Temp artifacts published: v0-488e5b1fb 🚀

@edipascale
Copy link
Contributor Author

@Frostman main thing that is missing now is updating gw_vpc_sync.go to also watch Externals and create VPCInfo objects for them. Do I need to create a separate manager for this?

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch 2 times, most recently from 7f7ba71 to d129a26 Compare September 15, 2025 14:18
@github-actions
Copy link

🚀 Temp artifacts published: v0-d129a26d6 🚀

@edipascale
Copy link
Contributor Author

@Frostman main thing that is missing now is updating gw_vpc_sync.go to also watch Externals and create VPCInfo objects for them. Do I need to create a separate manager for this?

I've replicated what we do for VPCs and I manually confirmed that deleting/creating an external deletes/creates the corresponding VPCInfo object. Note however that I'm always associating the 0.0.0.0/0 prefix as a subnet for an external, since there is no information attached to the external itself regarding which prefixes are allowed; this works for our default use case, but it might not necessarily be what we want to do in general?

Anyway, tagging as ready for review, as it's mature enough at least as a base for further discussion

@edipascale edipascale marked this pull request as ready for review September 15, 2025 15:02
@edipascale edipascale requested a review from a team as a code owner September 15, 2025 15:02
@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from d129a26 to 69d57f7 Compare September 16, 2025 12:45
@github-actions
Copy link

🚀 Temp artifacts published: v0-69d57f7f5 🚀

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from 69d57f7 to 7e5a796 Compare September 17, 2025 08:09
@github-actions
Copy link

🚀 Temp artifacts published: v0-7e5a796f8 🚀

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from 7e5a796 to 8ae043a Compare September 23, 2025 08:34
@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from 4cd5cbc to 0e89f55 Compare October 14, 2025 07:44
@github-actions
Copy link

🚀 Temp artifacts published: v0-0e89f55ba 🚀

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from 0e89f55 to d94cbdd Compare October 16, 2025 15:24
@github-actions
Copy link

🚀 Temp artifacts published: v0-d94cbdd3c 🚀

@edipascale edipascale linked an issue Oct 21, 2025 that may be closed by this pull request
@edipascale
Copy link
Contributor Author

@Frostman copying here what we discussed via slack so the information is all in one place:

main issue I see with this PR is the VNI space shared between externals and VPCs. We do allocations sequentially for all objects of the same kind and then move to the other type, specifically right now we do externals first and then VPCs. This means that:

  1. whenever you have additions or deletions in the type of object you allocate first, the second block gets shifted accordingly, causing unnecessary update operations and conflicts. In our case, the gateway agent will enter a crash loop if adding an external when there are VPCs, due to VNI collision.
  2. if there are objects of both types when we do a fabric update, some of the VNIs to be allocated for the first object type are going to be already used and we'll get conflicts.

maybe if we wanted to keep a single VNI space and address the problems above we could consider checking the existing allocations first and only allocating new stuff / removing old ones, but that complicates the code and (for deletions) forces us to deal with gaps in the allocated resource space. Of course none of this would be a problem if we had proper declarative APIs and the ability to replace config in one go...

@edipascale edipascale force-pushed the ema/externals-as-vpcs branch from d94cbdd to 5630063 Compare October 29, 2025 12:16
@edipascale
Copy link
Contributor Author

@Frostman I've taken your comment on doing things like we do for the loopback workaround and gave it a shot myself. It's ugly because I don't want to change how we store VNIs for VPCs (for backwards compatibility), so we have to do a double translation, but it appears to work. I'm sure it could be further optimized but at least it will allow us to move this forward faster - I've kept it as a separate commit anyway in case we want to take a different approach.

@github-actions
Copy link

🚀 Temp artifacts published: v0-5630063a1 🚀

edipascale and others added 4 commits November 2, 2025 22:28
add l3vni to external VRFs, advertise routes we learn from
the external (filtered by BGP community) over EVPN. This
way the gateway will see them just as another VPC and we
will be able to do external peering via GW with no special
code.

add also the IRB VLAN, else routes from the external are not going
to be advertised over BGP EVPN.

for the external outbound route-map, pre-emptively allow any prefix
belonging to the external's ipv4namespace, as opposed to selectively
adding VPC prefixes to a prefix-list depending on the external
peering. this way, routes learned via the gateway will be advertised
to the external, which allows traffic to be routed back to the VPC.

place externals VNIs and IRB VLANs before the VPCs, as they share
the same space and are less likely to change dynamically. This way
we will not see a different external VNI/VLAN every time we create
or delete a VPC.

make the catalog errors warnings for now, to allow agent upgrade.
these should be made into full blown errors eventually.

Signed-off-by: Emanuele Di Pascale <[email protected]>
Signed-off-by: Emanuele Di Pascale <[email protected]>
@Frostman Frostman force-pushed the ema/externals-as-vpcs branch from 5630063 to abafe6f Compare November 3, 2025 07:39
@Frostman Frostman force-pushed the ema/externals-as-vpcs branch from abafe6f to d69beae Compare November 3, 2025 07:40
@github-actions
Copy link

github-actions bot commented Nov 3, 2025

🚀 Temp artifacts published: v0-d69beaef5 🚀

@Frostman Frostman requested a review from Copilot November 3, 2025 07:49
@Frostman Frostman added ci:+hlab Enable hybrid VLAB tests ci:+release Enable VLAB release tests labels Nov 3, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends the VNI (Virtual Network Identifier) allocation system to support External resources in addition to VPCs. The changes unify VPC and External VNI management under a common catalog, ensuring both resource types receive proper VNI allocations and IRB VLAN assignments.

Key changes:

  • Renamed UpdateVPCs to UpdateVNIs to reflect that it now handles both VPCs and Externals
  • Consolidated catalog name from CatVPCs to CatVNIs with updated documentation
  • Added GetExternalVNI method and new GwExternalSync controller for External resource reconciliation
  • Updated prefix constants from LoWorkaround* to cleaner Req* names

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
pkg/manager/librarian/librarian.go Core logic changes: renamed UpdateVPCs to UpdateVNIs, added External VNI allocation, updated catalog handling for both VPCs and Externals
pkg/hhfctl/inspect/conn.go Updated constant references from LoWorkaroundReqPrefix* to ReqPrefix*
pkg/ctrl/vpc_ctrl.go Updated VPC reconciler to call renamed UpdateVNIs method
pkg/ctrl/gw_sync.go Added GwExternalSync reconciler for External resources with VPCInfo synchronization
pkg/ctrl/agent_ctrl.go Updated agent reconciler to handle External VNIs and call UpdateVNIs
pkg/agent/dozer/bcm/plan.go Removed unused prefix list logic, added IRB VLAN and VNI configuration for externals, updated to use ReqForExt
config/rbac/role.yaml Consolidated RBAC rules for externals and vpcs finalizers
config/crd/bases/agent.githedgehog.com_catalogs.yaml Updated field descriptions to mention Externals with ext@ prefix
config/crd/bases/agent.githedgehog.com_agents.yaml Updated field descriptions to mention Externals with ext@ prefix
cmd/main.go Added setup call for GwExternalSync controller
api/vpc/v1beta1/vpc_types.go Added validation to prevent VPC names from starting with VPCInfoExtPrefix
api/vpc/v1beta1/external_types.go Added VPCInfoExtPrefix constant definition
api/agent/v1beta1/catalog_types.go Updated field comments to document External inclusion
Comments suppressed due to low confidence (1)

pkg/ctrl/gw_sync.go:214

  • The hardcoded subnet entry with key "external" and CIDR "0.0.0.0/0" is a placeholder as indicated by the FIXME comment above. This magic string and default route could cause issues if other code depends on accurate subnet information. Consider using a more descriptive key or documenting this placeholder pattern more explicitly until the proper implementation is added.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

🚀 Temp artifacts published: v0-d69beaef5 🚀

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

🚀 Temp artifacts published: v0-567c14e97 🚀

@Frostman Frostman self-requested a review November 3, 2025 09:51
@Frostman Frostman merged commit 2cafcbd into master Nov 3, 2025
16 checks passed
@Frostman Frostman deleted the ema/externals-as-vpcs branch November 3, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:+hlab Enable hybrid VLAB tests ci:+release Enable VLAB release tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose BGP externals as VPCs for gateway peering

3 participants