The baseline-default-deny CiliumNetworkPolicy in catalyst-system listed
14 platform namespaces in its egress allow-list (keycloak, gitea,
powerdns, cnpg-system, openbao, harbor, nats-system, loki, mimir, tempo,
alloy, opentelemetry, external-secrets-system, cert-manager) but did NOT
include `sme`. The bp-sme-platform chart deploys the SME control-plane
into namespace `sme`, and console in catalyst-system reaches
`gateway.sme.svc.cluster.local:8080` for every voucher list / issue /
redeem call (plus admin reaches the same gateway for tenant onboarding).
Every such call was therefore dropped at the egress hook and timed out
at 5s, surfaced at the operator as 503 `context deadline exceeded` on
the voucher list / voucher issue panels.
Reproduction on t32 (2026-05-19, fresh prov, READ-ONLY):
$ kubectl exec -n catalyst-system catalyst-api-59d5cf5644-wrg4x \\
-- curl -m 5 http://gateway.sme.svc.cluster.local:8080/healthz
000 time=5.002937
curl: (28) Connection timed out after 5002 milliseconds
Live CNP egress excerpt (kubectl get cnp -n catalyst-system
baseline-default-deny -o yaml | yq '.spec.egress[3]'):
toEndpoints:
- matchExpressions:
- key: k8s:io.kubernetes.pod.namespace
operator: In
values:
- keycloak ... - cert-manager # (no 'sme')
Fix: add `sme` to BOTH the values.yaml default
(`.Values.security.baselineCnp.allowedPlatformNamespaces`) AND the
template's `default (list ...)` fallback, so a Helm install with no
values overrides still renders the allow.
Originally masqueraded under #1748 (voucher list 503) and #1749 (voucher
issue 503) — those were thought to be services-build 502 regressions,
but this is a distinct CNP-misconfig bug class.
Validation:
- `helm template` confirms rendered CNP now lists `sme` in egress.
- `kubectl apply --dry-run=server` against t32 apiserver passes
("ciliumnetworkpolicy.cilium.io/baseline-default-deny configured").
Chart bumped 1.4.188 → 1.4.189; bootstrap-kit pin bumped to match.
No live patching on t32 — fix verified via server-side dry-run only,
per Principle #15.
Closes #1917
Refs #1748
Refs #1749
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
This commit is contained in:
parent
446da60ca4
commit
bf577e9d7b
@ -608,7 +608,7 @@ spec:
|
||||
# during the YAML scanner break introduced by PR #1858 and fixed
|
||||
# by PR #1866. Auto-bump-pin step didn't fire during the outage,
|
||||
# so this pin lagged by 2 versions. Refs #1864.
|
||||
version: 1.4.188
|
||||
version: 1.4.189
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: bp-catalyst-platform
|
||||
|
||||
@ -1,5 +1,27 @@
|
||||
apiVersion: v2
|
||||
name: bp-catalyst-platform
|
||||
# 1.4.189 — TBD-A38 (issue #1917) baseline-default-deny CNP egress
|
||||
# allow-list extended with `sme` namespace. The bp-sme-platform chart
|
||||
# deploys the SME control-plane (auth, billing, catalog, console,
|
||||
# domain, gateway, marketplace, notification, provisioning, tenant)
|
||||
# into namespace `sme`. Console in catalyst-system reaches
|
||||
# `gateway.sme.svc.cluster.local:8080` for every voucher list / issue /
|
||||
# redeem call (and admin reaches the same gateway for tenant onboarding).
|
||||
# Pre-fix the egress list contained {keycloak, gitea, powerdns,
|
||||
# cnpg-system, openbao, harbor, nats-system, loki, mimir, tempo, alloy,
|
||||
# opentelemetry, external-secrets-system, cert-manager} — `sme` was
|
||||
# omitted. Every console→sme-gateway call timed out at 5s and surfaced
|
||||
# at the operator as a 503 `context deadline exceeded` on the voucher
|
||||
# list / voucher issue panels. Reproduced on t32 2026-05-19 from inside
|
||||
# catalyst-api Pod:
|
||||
# curl -m 5 http://gateway.sme.svc.cluster.local:8080/healthz
|
||||
# curl: (28) Connection timed out after 5002 milliseconds
|
||||
# Originally masqueraded under #1748 (voucher list 503) and #1749
|
||||
# (voucher issue 503), which were thought to be services-build 502
|
||||
# regressions. Isolated as a distinct bug class — pure CNP misconfig.
|
||||
# Both the values.yaml default and the template `default (list ...)`
|
||||
# fallback were updated so a Helm install with NO values overrides still
|
||||
# renders the `sme` allow.
|
||||
# 1.4.187 — TBD-X1 (issue #1793) notification.yaml SMTP_USER / SMTP_PASS
|
||||
# env wiring. Pre-fix the notification Pod read only SMTP_HOST / PORT /
|
||||
# FROM from sme-secrets, so the Go net/smtp client dialed Stalwart
|
||||
@ -1308,7 +1330,7 @@ name: bp-catalyst-platform
|
||||
# 25/TCP (legacy SMTP fallback). All three are explicitly scoped to
|
||||
# `toEntities: world`, matching the existing 443/TCP allow. No other
|
||||
# rule semantics change. (Fixes PIN-issue 502 regression from #1785.)
|
||||
version: 1.4.188
|
||||
version: 1.4.189
|
||||
appVersion: 1.4.188
|
||||
# 1.4.183 — fix(httproute): omit default sectionName so multi-zone
|
||||
# Sovereigns attach via Cilium Gateway hostname matcher (Closes #1884,
|
||||
|
||||
@ -70,7 +70,7 @@ unconditionally on every Sovereign and protects catalyst-system even
|
||||
when qaFixtures is disabled.
|
||||
*/}}
|
||||
{{- if .Values.security.baselineCnp.enabled }}
|
||||
{{- $allowedPlatform := .Values.security.baselineCnp.allowedPlatformNamespaces | default (list "keycloak" "gitea" "powerdns" "cnpg-system" "openbao" "harbor" "nats-system" "loki" "mimir" "tempo" "alloy" "opentelemetry" "external-secrets-system" "cert-manager") }}
|
||||
{{- $allowedPlatform := .Values.security.baselineCnp.allowedPlatformNamespaces | default (list "keycloak" "gitea" "powerdns" "cnpg-system" "openbao" "harbor" "nats-system" "loki" "mimir" "tempo" "alloy" "opentelemetry" "external-secrets-system" "cert-manager" "sme") }}
|
||||
{{- $allowedIngressNs := .Values.security.baselineCnp.allowedIngressNamespaces | default (list "catalyst" "flux-system" "kube-system") }}
|
||||
apiVersion: cilium.io/v2
|
||||
kind: CiliumNetworkPolicy
|
||||
|
||||
@ -1543,6 +1543,17 @@ security:
|
||||
enabled: true
|
||||
# Platform namespaces catalyst-system Pods are allowed to egress to.
|
||||
# Override to add tenant-specific or per-Sovereign namespaces.
|
||||
#
|
||||
# `sme` — the bp-sme-platform chart deploys the SME control-plane
|
||||
# (auth, billing, catalog, console, domain, gateway, marketplace,
|
||||
# notification, provisioning, tenant) into namespace `sme`. Console
|
||||
# in catalyst-system reaches `gateway.sme.svc.cluster.local:8080`
|
||||
# for every voucher list/issue/redeem call (and admin reaches the
|
||||
# same gateway for tenant onboarding). Without this allow, console
|
||||
# requests time out at 5s and return 503 `context deadline exceeded`
|
||||
# — surfaced on t32 (2026-05-19) as the masquerading root cause of
|
||||
# issues #1748 (voucher list) and #1749 (voucher issue). See
|
||||
# TBD-A38 / #1917 for the full reproduction.
|
||||
allowedPlatformNamespaces:
|
||||
- keycloak
|
||||
- gitea
|
||||
@ -1558,6 +1569,7 @@ security:
|
||||
- opentelemetry
|
||||
- external-secrets-system
|
||||
- cert-manager
|
||||
- sme
|
||||
# Adjacent namespaces whose Pods are allowed to INGRESS into
|
||||
# catalyst-system. Defaults cover:
|
||||
# - catalyst — bp-self-sovereign-cutover Jobs (incl. the
|
||||
|
||||
Loading…
Reference in New Issue
Block a user