openova/infra
e3mrah 0538f6ee68
fix(infra): advertise public IP as k3s node-external-ip so Cilium inter-region tunnel works (Refs TBD-A7) (#1715)
Add --node-external-ip=$${CP_PUBLIC_IPV4} to the k3s server install in
infra/hetzner/cloudinit-control-plane.tftpl so every CP publishes BOTH
node.status.addresses[InternalIP=10.0.1.2] AND ExternalIP=<public ipv4>.

Bug evidence (Wave 28-E, t22-omantel-biz 2026-05-18):
  hel/fsn/sin all advertise InternalIP=10.0.1.2 with NO ExternalIP.
  After the 2026-05-15 per-region-network refactor every region's CP
  sits in its OWN isolated hcloud_network, so 10.0.1.2 is locally
  scoped on each VPS and NOT routable cross-region. Cilium picks the
  InternalIP as its tunnel endpoint by default → cross-region VXLAN
  tunnels resolve to 10.0.1.2 on every peer → inter-region pod traffic
  blackholes (pod-to-pod 0/6 across regions).

docs/SOVEREIGN-MULTI-REGION-DOD.md A2 mandate:
  "inter-region link = DMZ WireGuard over PUBLIC IPs ALWAYS
   (never any provider's private network)".

Publishing the public IPv4 as ExternalIP lets Cilium promote it to the
tunnel endpoint when peer addresses include External + Internal, which
restores cross-region pod reachability without breaking intra-cluster
paths — InternalIP stays primary for kube-apiserver advertise + pod-to-
CP dial (the original reason --node-ip was pinned to private in
PR-#62-era; the comment at line 1370-1378 still holds and is preserved).

Effect:
  - Only takes effect on FRESH provisions (t23+). t22 already deployed
    cannot be remediated by a cloudinit change.
  - Both primary CP and secondary CPs go through this same template
    (main.tf templatefile() calls for primary at line 636 and per
    secondary at line 1187), so a single template edit covers all
    regions.
  - Approach A (smaller / immediate). Approach B (DMZ WireGuard overlay
    DaemonSet per platform/bp-dmz-vcluster/) follows as architectural
    follow-up if A alone doesn't fully resolve cross-region pod
    traffic on t23+.

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:58:14 +04:00
..
cloudflare-worker-leases feat(continuum): K-Cont-4 — Cloudflare Worker source + tofu wiring for lease witness (#1101) (#1159) 2026-05-09 08:01:44 +04:00
hetzner fix(infra): advertise public IP as k3s node-external-ip so Cilium inter-region tunnel works (Refs TBD-A7) (#1715) 2026-05-18 18:58:14 +04:00