openova/clusters
e3mrah bf577e9d7b
fix(bp-sme): allow egress from catalyst-system to gateway:8080 (TBD-A38, Closes #1917) (#1919)
The baseline-default-deny CiliumNetworkPolicy in catalyst-system listed
14 platform namespaces in its egress allow-list (keycloak, gitea,
powerdns, cnpg-system, openbao, harbor, nats-system, loki, mimir, tempo,
alloy, opentelemetry, external-secrets-system, cert-manager) but did NOT
include `sme`. The bp-sme-platform chart deploys the SME control-plane
into namespace `sme`, and console in catalyst-system reaches
`gateway.sme.svc.cluster.local:8080` for every voucher list / issue /
redeem call (plus admin reaches the same gateway for tenant onboarding).
Every such call was therefore dropped at the egress hook and timed out
at 5s, surfaced at the operator as 503 `context deadline exceeded` on
the voucher list / voucher issue panels.

Reproduction on t32 (2026-05-19, fresh prov, READ-ONLY):

  $ kubectl exec -n catalyst-system catalyst-api-59d5cf5644-wrg4x \\
      -- curl -m 5 http://gateway.sme.svc.cluster.local:8080/healthz
  000 time=5.002937
  curl: (28) Connection timed out after 5002 milliseconds

Live CNP egress excerpt (kubectl get cnp -n catalyst-system
baseline-default-deny -o yaml | yq '.spec.egress[3]'):

  toEndpoints:
    - matchExpressions:
        - key: k8s:io.kubernetes.pod.namespace
          operator: In
          values:
            - keycloak  ... - cert-manager   # (no 'sme')

Fix: add `sme` to BOTH the values.yaml default
(`.Values.security.baselineCnp.allowedPlatformNamespaces`) AND the
template's `default (list ...)` fallback, so a Helm install with no
values overrides still renders the allow.

Originally masqueraded under #1748 (voucher list 503) and #1749 (voucher
issue 503) — those were thought to be services-build 502 regressions,
but this is a distinct CNP-misconfig bug class.

Validation:
- `helm template` confirms rendered CNP now lists `sme` in egress.
- `kubectl apply --dry-run=server` against t32 apiserver passes
  ("ciliumnetworkpolicy.cilium.io/baseline-default-deny configured").

Chart bumped 1.4.188 → 1.4.189; bootstrap-kit pin bumped to match.
No live patching on t32 — fix verified via server-side dry-run only,
per Principle #15.

Closes #1917
Refs #1748
Refs #1749

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
2026-05-19 10:49:47 +04:00
..
_template fix(bp-sme): allow egress from catalyst-system to gateway:8080 (TBD-A38, Closes #1917) (#1919) 2026-05-19 10:49:47 +04:00
contabo-mkt/tenants provision: deploy tenant e2e-wp-test (plan: m, apps: 1) 2026-05-06 02:23:14 +04:00
omantel.omani.works fix(bp-cert-manager): add CRD-establishment gate to close ClusterIssuer race (#149) (#1355) 2026-05-11 08:28:06 +04:00
otech.omani.works fix(bp-cert-manager): add CRD-establishment gate to close ClusterIssuer race (#149) (#1355) 2026-05-11 08:28:06 +04:00