The baseline-default-deny CiliumNetworkPolicy in catalyst-system listed
14 platform namespaces in its egress allow-list (keycloak, gitea,
powerdns, cnpg-system, openbao, harbor, nats-system, loki, mimir, tempo,
alloy, opentelemetry, external-secrets-system, cert-manager) but did NOT
include `sme`. The bp-sme-platform chart deploys the SME control-plane
into namespace `sme`, and console in catalyst-system reaches
`gateway.sme.svc.cluster.local:8080` for every voucher list / issue /
redeem call (plus admin reaches the same gateway for tenant onboarding).
Every such call was therefore dropped at the egress hook and timed out
at 5s, surfaced at the operator as 503 `context deadline exceeded` on
the voucher list / voucher issue panels.
Reproduction on t32 (2026-05-19, fresh prov, READ-ONLY):
$ kubectl exec -n catalyst-system catalyst-api-59d5cf5644-wrg4x \\
-- curl -m 5 http://gateway.sme.svc.cluster.local:8080/healthz
000 time=5.002937
curl: (28) Connection timed out after 5002 milliseconds
Live CNP egress excerpt (kubectl get cnp -n catalyst-system
baseline-default-deny -o yaml | yq '.spec.egress[3]'):
toEndpoints:
- matchExpressions:
- key: k8s:io.kubernetes.pod.namespace
operator: In
values:
- keycloak ... - cert-manager # (no 'sme')
Fix: add `sme` to BOTH the values.yaml default
(`.Values.security.baselineCnp.allowedPlatformNamespaces`) AND the
template's `default (list ...)` fallback, so a Helm install with no
values overrides still renders the allow.
Originally masqueraded under #1748 (voucher list 503) and #1749 (voucher
issue 503) — those were thought to be services-build 502 regressions,
but this is a distinct CNP-misconfig bug class.
Validation:
- `helm template` confirms rendered CNP now lists `sme` in egress.
- `kubectl apply --dry-run=server` against t32 apiserver passes
("ciliumnetworkpolicy.cilium.io/baseline-default-deny configured").
Chart bumped 1.4.188 → 1.4.189; bootstrap-kit pin bumped to match.
No live patching on t32 — fix verified via server-side dry-run only,
per Principle #15.
Closes #1917
Refs #1748
Refs #1749
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
922 lines
56 KiB
YAML
922 lines
56 KiB
YAML
# bp-catalyst-platform — Catalyst Blueprint #13 of 13. The umbrella
|
||
# Blueprint that brings up the Catalyst control plane: console, marketplace,
|
||
# admin, catalog-svc, projector, provisioning, environment-controller,
|
||
# blueprint-controller, billing.
|
||
#
|
||
# Per docs/ARCHITECTURE.md §11 (Catalyst-on-Catalyst): once this is Ready,
|
||
# the Sovereign is fully self-sufficient — sovereign-admin can log into
|
||
# console.${SOVEREIGN_FQDN} and proceed with Phase 2 day-1 setup.
|
||
#
|
||
# Wrapper chart: products/catalyst/chart/
|
||
|
||
---
|
||
apiVersion: v1
|
||
kind: Namespace
|
||
metadata:
|
||
name: catalyst-system
|
||
labels:
|
||
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
||
---
|
||
apiVersion: source.toolkit.fluxcd.io/v1beta2
|
||
kind: HelmRepository
|
||
metadata:
|
||
name: bp-catalyst-platform
|
||
namespace: flux-system
|
||
spec:
|
||
type: oci
|
||
interval: 15m
|
||
url: oci://ghcr.io/openova-io
|
||
secretRef:
|
||
name: ghcr-pull
|
||
---
|
||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||
kind: HelmRelease
|
||
metadata:
|
||
name: bp-catalyst-platform
|
||
namespace: flux-system
|
||
labels:
|
||
# slot encodes bootstrap-kit install order; component=catalyst-platform
|
||
# overrides the default Phase 1 mapping so this HR lands in
|
||
# Phase 3 (Sovereign Live) on the openova-flow canvas — once
|
||
# Ready=True the Sovereign is fully self-sufficient.
|
||
catalyst.openova.io/slot: "13"
|
||
catalyst.openova.io/component: catalyst-platform
|
||
spec:
|
||
interval: 15m
|
||
releaseName: catalyst-platform
|
||
targetNamespace: catalyst-system
|
||
dependsOn:
|
||
- name: bp-gitea
|
||
# bp-gateway-api (issue #503): umbrella chart ships catalyst-ui +
|
||
# catalyst-api HTTPRoute templates; gateway.networking.k8s.io/v1
|
||
# CRDs must be registered first.
|
||
- name: bp-gateway-api
|
||
# bp-keycloak + bp-cnpg (issue #512): the catalyst-platform umbrella
|
||
# post-install Jobs bootstrap OIDC clients in Keycloak and seed
|
||
# PostgreSQL schemas for catalog-svc / projector / billing /
|
||
# provisioning. Both Keycloak and cnpg take 5+ minutes to reach Ready
|
||
# on a fresh Sovereign — without an explicit dep, the umbrella's
|
||
# hook starts before they're warm and times out at 15m.
|
||
# Phase-8a-preflight otech16 (2026-05-02): adding bp-keycloak +
|
||
# bp-cnpg here makes Flux wait for both Ready=True before starting
|
||
# the umbrella install, eliminating the race.
|
||
- name: bp-keycloak
|
||
- name: bp-cnpg
|
||
# bp-crossplane-claims (chart-roll-rca iter-15, 2026-05-10): owns the
|
||
# access.openova.io/v1alpha1 XRD that qa-fixtures UserAccess CRs
|
||
# require. Without this dep, slot 13 races slot 14 and the umbrella
|
||
# upgrade fails admission with `no matches for kind "UserAccess" in
|
||
# version "access.openova.io/v1alpha1"`. The release Secret then
|
||
# enters `pending-upgrade` and waits the full 15m timeout × 3 retries
|
||
# before any operator-visible failure (the 2026-05-10 omantel.biz
|
||
# 90-min wedge). With this edge, the chart never enters the failing
|
||
# state on a fresh roll.
|
||
- name: bp-crossplane-claims
|
||
chart:
|
||
spec:
|
||
chart: bp-catalyst-platform
|
||
# 1.4.0 (issue #827): adds per-zone wildcard Certificate template.
|
||
# When `parentZones` is populated the chart renders one
|
||
# cert-manager.io/v1.Certificate per zone in kube-system; the
|
||
# Cilium Gateway listeners reference the per-zone Secrets. When
|
||
# `parentZones` is empty (legacy single-zone Sovereign) the chart
|
||
# falls back to a single Certificate covering `*.<sovereignFQDN>`
|
||
# so existing provisioning paths keep working.
|
||
# 1.4.1 (PR #839): RBAC dual-mode render fix (Helm + Kustomize).
|
||
# 1.4.2 (PR #841): POWERDNS env literal (no envsubst-mid-render).
|
||
# 1.4.3 (issue #859): auto-provision sme-pg CNPG Cluster +
|
||
# sme-secrets when ingress.marketplace.enabled=true so SME
|
||
# services land Ready on a fresh Sovereign without hand-rolled
|
||
# SealedSecrets. Catalyst-Zero (contabo) keeps its pre-existing
|
||
# clusters/contabo-mkt/apps/sme/data/* manifests — those are
|
||
# outside templates/kustomization.yaml's resource list so the
|
||
# contabo Kustomize-mode build is unaffected.
|
||
# 1.4.4 (issue #861): deploy FerretDB in `sme` ns + cross-ns
|
||
# CiliumNetworkPolicy from sme → valkey. Unblocks the 4 SME
|
||
# services (catalog, tenant, domain, provisioning) that pin to
|
||
# ferretdb.sme.svc.cluster.local for the MongoDB wire and the 2
|
||
# services (auth, gateway) that pin to valkey for session/state.
|
||
# cnpg-cluster.yaml extended to bootstrap sme_documents (FerretDB
|
||
# backing DB) alongside sme_billing.
|
||
# 1.4.5 (issue #863): mirror bp-valkey's auto-generated auth
|
||
# password from `valkey/valkey` Secret into `sme/sme-valkey-auth`
|
||
# via Helm lookup, and wire VALKEY_PASSWORD into auth + gateway
|
||
# Deployments. Clears the NOAUTH HELLO crashloop that started
|
||
# appearing after 1.4.4 made cross-ns Valkey reachable.
|
||
# 1.4.6 (issue #863 follow-up): rebuild chart artifact to bundle
|
||
# the rebuilt services-auth + services-gateway image (SHA fa4395f)
|
||
# that contains the ConnectValkeyWithAuth Go change. 1.4.5 shipped
|
||
# with the OLD image SHA baked in due to a race between the
|
||
# blueprint-release pipeline and the services-build deploy step.
|
||
# 1.4.7 (issue #866): mirror the gitea-admin password into
|
||
# `sme/provisioning-github-token` so the last 1/13 SME pod
|
||
# (provisioning) reaches Running 1/1 on a fresh Sovereign,
|
||
# completing the SME stack 12/13 → 13/13. Same lookup-and-mirror
|
||
# pattern as valkey-cross-ns-secret.yaml (#863).
|
||
# 1.4.8 (issue #868): fix marketplace UI PIN-signin — /api/*
|
||
# HTTPRoute now backendRefs sme/gateway:8080 (cross-namespace,
|
||
# authorised by ReferenceGrant). The previous catalyst-system/
|
||
# marketplace-api Service had zero backing Pods, so every signin
|
||
# POST 503'd at the gateway. Pairs with services-auth route alias
|
||
# /auth/send-pin → SendMagicLink (and /auth/verify-pin →
|
||
# VerifyMagicLink) so the UI's PIN-naming reaches the existing
|
||
# backend handler.
|
||
# 1.4.13 (issue #882): NEW templates/sme-services/sme-tenants-
|
||
# kustomization.yaml renders a Flux Kustomization in flux-system
|
||
# that watches ./clusters/<sov-fqdn>/sme-tenants — the path the
|
||
# catalyst-api SME-tenant orchestrator (sme_tenant_gitops.go)
|
||
# commits per-tenant overlays to. Without this, POST
|
||
# /api/v1/sme/tenants reached state=done optimistically but no
|
||
# K8s resources materialised because nothing reconciled the
|
||
# orchestrator's write target. Gated on
|
||
# ingress.marketplace.enabled — non-marketplace Sovereigns don't
|
||
# run the SME tenant pipeline.
|
||
# 1.4.14 (issue #879 follow-up): chart-version-only republish to
|
||
# bake catalyst-api image SHA 7bfd6df (the #879 fix commit) into
|
||
# values.yaml. 1.4.13 OCI bytes still reference the OLD image SHA
|
||
# because the deploy-bot updated values.yaml AFTER the chart was
|
||
# published. Same deploy-step race documented in 1.4.6 / 1.4.9 /
|
||
# 1.4.12 changelog.
|
||
# 1.4.15 (issue #887): auto-provision marketplace-api-secrets
|
||
# Secret on Sovereign install. templates/marketplace-api/
|
||
# deployment.yaml referenced a secretKeyRef on
|
||
# `marketplace-api-secrets` but the chart never rendered the
|
||
# Secret — caught live on otech103, marketplace-api in
|
||
# CreateContainerConfigError. Fix mirrors sme-secrets/
|
||
# valkey-cross-ns-secret/provisioning-github-token Helm-lookup
|
||
# persistence pattern. helm.sh/resource-policy: keep.
|
||
# 1.4.16 (#893/#889 follow-up): chart-version-only republish to
|
||
# bake catalyst-api image SHA 727fb2f (containing the parent-
|
||
# kustomization.yaml index + helmrepositories.yaml emit + correct
|
||
# per-blueprint sourceRef.name in tenant overlay templates) into
|
||
# values.yaml. Without this bump the OCI artifact still references
|
||
# the old image and the Sovereign's tenant orchestrator emits
|
||
# tenant overlays with stale openova-blueprints sourceRef.
|
||
# 1.4.17 (issue #901): unblock Sovereign Console login on every
|
||
# fresh provision. 3-bug chain:
|
||
# 1. NEW templates/catalyst-openova-kc-credentials-secret.yaml
|
||
# auto-mirrors the canonical KC SA Secret (`keycloak/
|
||
# catalyst-kc-sa-credentials`) into catalyst-system as
|
||
# `catalyst-openova-kc-credentials` with the keys
|
||
# api-deployment.yaml's PIN-auth env block expects. Gated on
|
||
# `lookup "v1" "Secret" "keycloak" "catalyst-kc-sa-credentials"`
|
||
# returning non-nil — renders only on Sovereign, skips on
|
||
# contabo (which has its own hand-rolled Secret). Same Helm-
|
||
# `lookup` persistence + `helm.sh/resource-policy: keep`
|
||
# pattern as templates/marketplace-api/secret.yaml (#887).
|
||
# 2. SMTP host/port/from defaults flow through .Values.sovereign.
|
||
# smtp.* (mail.openova.io:587 / noreply@openova.io). SMTP
|
||
# user/pass mirrored from `catalyst-system/sovereign-smtp-
|
||
# credentials` (#883) when present.
|
||
# 3. CATALYST_POST_AUTH_REDIRECT default flips from
|
||
# /sovereign/wizard (mothership-only) to /sovereign/components
|
||
# (post-handover Sovereign homepage). Per-Sovereign overlays
|
||
# override via catalystApi.env additional-env patch.
|
||
# 1.4.18 (issue #910): NEW templates/sme-services/sme-namespace.yaml
|
||
# creates the `sme` namespace on Sovereigns where the marketplace
|
||
# is enabled. Without this, chart 1.4.17 install failed 23 times
|
||
# with `failed to create resource: namespaces "sme" not found` on
|
||
# every fresh franchised Sovereign with marketplace.enabled=true —
|
||
# caught live on otech105 (2026-05-05). Same dual-mode contract as
|
||
# the rest of templates/sme-services/* (gated on
|
||
# ingress.marketplace.enabled, excluded from kustomization.yaml's
|
||
# resources: list).
|
||
# 1.4.19 (issue #910 — Bugs 2 + 3): unblock Sovereign Console PIN-
|
||
# login on a freshly franchised cluster.
|
||
# Bug 2: CATALYST_SESSION_COOKIE_DOMAIN literal flips from
|
||
# `console.openova.io` to `""` (empty). On a Sovereign the
|
||
# request host is console.<sov-fqdn>, so the previous hardcoded
|
||
# value made the browser reject Set-Cookie (RFC 6265 §5.3 step 6
|
||
# Domain mismatch) and every /api/* request landed without a
|
||
# session, redirecting to /login forever. Empty value contract
|
||
# (Domain attribute omitted → cookie binds to request host) is
|
||
# correct on BOTH Sovereign (console.<sov-fqdn>) AND contabo
|
||
# (console.openova.io — wizard + magic-link served from the
|
||
# same host). Per-Sovereign overlays MAY override via
|
||
# catalystApi.env additional-env patch for unusual topologies.
|
||
#
|
||
# Bug 3: catalyst-openova-kc-credentials-secret.yaml's smtp-
|
||
# user/smtp-pass lookup precedence inverts: SOURCE
|
||
# (sovereign-smtp-credentials, seeded by A5's provisioner #883)
|
||
# wins over the persisted target. Pre-1.4.19 target-wins meant
|
||
# first-install rendered empty SMTP creds, persisted them, and
|
||
# NEVER picked up A5's seeded bytes — POST /api/v1/auth/pin/
|
||
# issue 502'd `email-send-failed` for the life of the cluster.
|
||
# Source-wins makes every Flux 1m reconcile re-read the source.
|
||
# KC fields keep "existing target wins" because bp-keycloak
|
||
# auto-rotates the client-secret on every Helm upgrade and we
|
||
# want that rotation to require explicit operator action
|
||
# (delete the target) rather than auto-roll the catalyst-api
|
||
# Pod.
|
||
# 1.4.20 (#924): Phase-2 SMTP cutover. SOURCE-wins precedence
|
||
# extended to (a) non-secret fields smtp-host/smtp-port/smtp-from
|
||
# so the per-Sovereign Stalwart relay (`mail.<sovereignFQDN>`)
|
||
# takes over from the mothership default (`mail.openova.io`) on
|
||
# the next reconcile after slot 95 (bp-stalwart-sovereign) lands,
|
||
# and (b) canonical key shape `smtp-user`/`smtp-pass` in addition
|
||
# to the legacy `user`/`password` source key shape — the new
|
||
# chart writes both shapes, this chart reads either.
|
||
# 1.4.22 (#915 SME blockers): six chart + orchestrator fixes
|
||
# unblocking alice signup gates 2-6 on franchised Sovereigns —
|
||
# issues #934 (auth SMTP empty), #940 (provisioning placeholder
|
||
# GITHUB_TOKEN + hardcoded upstream github.com), #941 (catalog
|
||
# migrateAppDeployable missing openclaw + stalwart-mail), #942
|
||
# (REDPANDA_BROKERS hardcoded to talentmesh — switched to NATS
|
||
# JetStream on Sovereigns per ADR-0001), #943 (bp-newapi
|
||
# silently skipped Deployment — paired bp-newapi 1.4.0 auto-
|
||
# provisions CNPG cluster + credentials Secret), #944 (CRITICAL
|
||
# cross-cluster pollution — GIT_BASE_PATH was hardcoded to
|
||
# contabo-mkt; chart values now template per-Sovereign with
|
||
# provisioning-binary Go-side validation guard refusing commits
|
||
# to foreign cluster trees). 2026-05-05.
|
||
# 1.4.23: deploy-bot auto-bump (services-auth image SHA roll).
|
||
# 1.4.24 (#934 follow-up): smeSecrets.smtp.{host,port,from,user}
|
||
# defaults populated with mothership relay (mail.openova.io:587)
|
||
# so SME auth Pod's PIN delivery (gate 2) works on Sovereigns
|
||
# whose A5-seeded sovereign-smtp-credentials Secret only carries
|
||
# smtp-user + smtp-pass without host/port/from. 2026-05-05.
|
||
# 1.4.25: deploy-bot auto-bump (sme-services 94ffe01 image roll).
|
||
# 1.4.26 (#957 follow-up): catalyst-api-cutover-driver
|
||
# ClusterRole gains `create tokenreviews.authentication.k8s.io`
|
||
# so /api/v1/internal/cutover/trigger can validate the
|
||
# auto-trigger Job's SA token via TokenReview. Without this rule
|
||
# every trigger call returned 502 "token-review-failed" on
|
||
# otech113 (chart 0.1.18 fixed the readiness loop but exposed
|
||
# this missing-RBAC bug as the next failure). 2026-05-05.
|
||
# 1.4.29 (#983 follow-up): Sovereign Console URL contract — clean
|
||
# root URLs (/dashboard /jobs /cloud …), sovereign_self.go store
|
||
# fallback (data renders the moment cutover-import lands without
|
||
# waiting for the orchestrator's chart-values overlay write).
|
||
# 2026-05-05.
|
||
# 1.4.95 (qa-loop iter-3 Fix #18, #1206): clusterroles +
|
||
# clusterrolebindings GVR added to k8scache.DefaultKinds + matching
|
||
# get/list/watch verbs on catalyst-api-cutover-driver ClusterRole
|
||
# (TC-122/196/199/248). Pairs with new CATALYST_BUILD_SHA +
|
||
# CATALYST_CHART_VERSION env vars on api-deployment.yaml so
|
||
# /api/v1/version returns the live SHA instead of `dev`/`0.0.0`
|
||
# (TC-261).
|
||
# 1.4.96 (qa-loop iter-3 Fix #18 follow-up): chart-packaging fix —
|
||
# .helmignore excludes crds/tests/ so Helm's pre-render CRD install
|
||
# no longer tries to apply the invalid Application sample as a CRD
|
||
# (the test fixture introduced by PR #1105). Without this every
|
||
# chart upgrade since 1.4.85 failed with `namespaces "acme" not
|
||
# found` — caught live on omantel 2026-05-09 attempting 1.4.84 ->
|
||
# 1.4.95. Bump pin so omantel + every other Sovereign sourcing
|
||
# this template picks up the fix on the next reconcile.
|
||
# 1.4.97 (qa-loop iter-4 Fix #24): apiextensions.k8s.io/v1
|
||
# customresourcedefinitions GVR added to k8scache.DefaultKinds +
|
||
# matching get/list/watch verbs on catalyst-api-cutover-driver
|
||
# ClusterRole (TC-199). Pairs with UI heading rename "Install
|
||
# Blueprint" → "Install — Blueprint Catalog" (TC-031). Per
|
||
# feedback_chroot_in_cluster_fallback.md every new GVR added to
|
||
# k8scache.DefaultKinds MUST get a matching rule on the cutover-
|
||
# driver SA — the chroot SovereignClient uses this SA via
|
||
# in-cluster fallback. Bump pin so omantel + every other Sovereign
|
||
# sourcing this template picks up the fix on the next reconcile.
|
||
# 1.4.99 (qa-loop iter-6 EPIC-6 Continuum DR target-state):
|
||
# adds singular `/continuum/{name}` route family + 5 new endpoints
|
||
# the matrix asserts (TC-312/324/326/329-335/339/343), seeds
|
||
# cont-omantel/qa-cnpg/pdm-1..3 fixtures + status seeders, ships
|
||
# cnpgpairs.dr.openova.io + pdms.dr.openova.io CRDs, ScheduledBackup
|
||
# + Backup fixtures (TC-337/338), and bumps tier-operator
|
||
# ClusterRole to grant continuums/cnpgpairs/pdms verbs (TC-344).
|
||
# Bp-crossplane-claims 1.1.2 carries the matching tier-operator
|
||
# extras update.
|
||
# 1.4.104 (qa-loop iter-7 Cluster-C Fix #36, #1231): target-state
|
||
# qa-fixtures stack (Org+Env+Blueprint+App) so application-controller
|
||
# reconciles qa-wp end-to-end into a real nginx Pod. Bp-qa-app
|
||
# sister chart at platform/qa-app/chart/ ships the real nginx
|
||
# bytes (CI publishes oci://ghcr.io/openova-io/bp-qa-app:0.1.0).
|
||
# Stacks on top of:
|
||
# 1.4.103 (Fix #37 follow-up): qa-continuum-status-seed Job FQN fix
|
||
# 1.4.106 (Fix #38 follow-up #3): qa-fixtures sovereignRef default
|
||
# = "omantel.biz" so Organization + Environment + Application +
|
||
# Blueprint + UserAccess validate against the FQDN pattern. Legacy
|
||
# "omantel" rejected at admission and blocked the chart upgrade
|
||
# even after the region-pattern fix.
|
||
# 1.4.105 (Fix #38 follow-up): qa-fixtures Application + Environment
|
||
# region defaults bumped to canonical 4-segment label
|
||
# `hz-fsn-rtz-prod` so the qa-wp Application from Fix #36 (#1231)
|
||
# validates against the CRD pattern `^[a-z]+-[a-z]+-[a-z]+-[a-z]+$`.
|
||
# Without this fix, `spec.regions[0]: Invalid value: "fsn1"` rejected
|
||
# the chart upgrade at admission and pinned omantel on the prior
|
||
# image SHA, blocking Fix #38's TC-141/TC-090/TC-383 from rolling.
|
||
# 1.4.102 (Fix #34 follow-up #1229): catalyst-api-cutover-driver
|
||
# ClusterRole grants update/patch/delete on workload kinds + scale
|
||
# subresources for the resource-action endpoints (PUT /k8s/.../scale,
|
||
# /restart, etc.) so chroot in-cluster fallback authorises through
|
||
# RBAC (TC-215, TC-218, TC-243, TC-247).
|
||
# 1.4.101 (Fix #37): EPIC-6 + EPIC-1 target-state qa-fixtures closeout
|
||
# — cnpg-clusters + Kyverno policy bundle.
|
||
# 1.4.106 (qa-loop iter-7 Fix #38 follow-up #3+#4): Organization
|
||
# sovereignRef = FQDN; bootstrap-kit defaults qaFixtures.sovereignRef
|
||
# to ${SOVEREIGN_FQDN}; UserAccess sovereignRef strips dots.
|
||
# 1.4.107 (qa-loop iter-8 Fix #40 — Cluster-A + Cluster-B): bp-wordpress
|
||
# Blueprint CR alias resolved by chained catalog client; node-labels
|
||
# seeder Job patches Nodes with topology.kubernetes.io/region short
|
||
# form derived from openova.io/region; CNPGPair renamed to qa-cnpgpair.
|
||
# 1.4.108 (qa-loop iter-8 Fix #41 — Cluster-A + Cluster-B closeout):
|
||
# Environment region split into provider/region/buildingBlock per CRD
|
||
# (TC-369); cluster-primary .spec.backup wired to in-cluster SeaweedFS
|
||
# S3 with admin keys seeded into qa-omantel namespace so ScheduledBackup
|
||
# succeeds (TC-338); cutover-driver ClusterRole adds kyverno.io read
|
||
# for compliance handler ClusterPolicy ingest (TC-026).
|
||
# 1.4.111 (qa-loop iter-8 Fix #42 + controller image bumps,
|
||
# PRs #1252 + #1253 + #1254): closes 3 controller bugs blocking
|
||
# qa-wp Pod spawn:
|
||
# - org-controller: UserAccess Claim namespace (was empty → 422)
|
||
# - env-controller: per-Env repo self-heal via EnsureRepo
|
||
# - app-controller: host-side Flux GitRepository + Kustomization
|
||
# upsert so per-Application manifests get pulled by Flux on
|
||
# the host cluster.
|
||
# Image tags bumped: org/env :1b29c71 → :72e3f08, app :3d1deef → :b321ada.
|
||
# 1.4.112 (Fix #42 follow-up): env-controller now calls
|
||
# EnsureBranch right after EnsureRepo so the env-type-mapped
|
||
# branch (`develop` for envType=dev) exists before PutFile.
|
||
# Without this Gitea returned 404 with "repository" in the body
|
||
# mapping to ErrRepoNotFound, dropping the controller into a
|
||
# permanent re-queue loop. Will need an env-controller image
|
||
# rebuild before the chart actually consumes the fix.
|
||
# 1.4.113 (Fix #42 image bump #2): env+app controllers bumped to
|
||
# :a3ba200 — env-controller has EnsureBranch (PR #1257);
|
||
# app-controller drops cross-namespace ownerRefs.
|
||
# 1.4.114 (Fix #42 follow-up #3): env+app controllers create the
|
||
# per-Org/per-App Gitea repos as PUBLIC so Flux can clone them
|
||
# anonymously (was failing on "authentication required").
|
||
# 1.4.115 (qa-loop iter-10 Fix #44 — application-controller targetNamespace).
|
||
# Rendered HelmRelease.spec.targetNamespace now resolves to the
|
||
# Application CR's own namespace (was the parent Org slug). Also
|
||
# sets spec.install.createNamespace=true so a missing workload
|
||
# namespace is provisioned by helm-controller on install. Closes
|
||
# matrix rows TC-068 / TC-100 / TC-204 / TC-262 / TC-263 (workload
|
||
# Pod was landing in `omantel-platform` instead of `qa-omantel`).
|
||
# 1.4.116 (Fix #44 follow-up): chart re-publish to bake the
|
||
# rebuilt application-controller image (:24aab61) into values.yaml
|
||
# — chart 1.4.115 had stale tag (:a3ba200) because GH Actions
|
||
# silently filters bot pushes from triggering blueprint-release.
|
||
# 1.4.117 (qa-loop iter-11 Fix #45 Cluster-B + Cluster-C):
|
||
# B) application-controller now observes downstream HelmRelease
|
||
# readiness (helm.toolkit.fluxcd.io/helmreleases get/list/
|
||
# watch added to ClusterRole) and rolls up Application
|
||
# .status.phase from Provisioning → Ready. Periodic 30s
|
||
# re-list ticker so HR readiness flips reach the parent.
|
||
# status.lastReconciledAt populated for TC-113.
|
||
# C) catalyst-api gains GET /sovereigns/{id}/applications/{name}
|
||
# (full Application detail) and accepts ?namespace= alias on
|
||
# /sovereigns/{id}/k8s/{kind}. SPA AppDetail.tsx falls back
|
||
# to the new GET when wizard store has no descriptor (typical
|
||
# chroot Sovereign case — closes the "App not found" misfire
|
||
# on TC-068 / TC-072 / TC-074). TC-262 / TC-263 (services
|
||
# filtered to qa-omantel only) flip PASS via the namespace
|
||
# alias.
|
||
# 1.4.118 (Fix #45 follow-up — chart re-publish): bakes the
|
||
# rebuilt application-controller image (:dfd48b1) into values.yaml.
|
||
# Chart 1.4.117 was packaged before the auto-bump commit landed,
|
||
# same race as 1.4.115/116; this bump re-publishes with the new
|
||
# tag and explicit blueprint-release dispatch.
|
||
# 1.4.109 (Fix #40 follow-up #2): drop /api/v1 from organization +
|
||
# environment-controller GITEA URL defaults — Gitea client appends
|
||
# it; the prior default produced /api/v1/api/v1/... 404s on
|
||
# EnsureOrg / EnsureRepo blocking qa-wp Application reconcile.
|
||
# bootstrap-kit qaFixtures.cnpgPairName default qa-cnpg → qa-cnpgpair
|
||
# so TC-306's "cnpgpair" substring assertion passes.
|
||
# 1.4.123 (qa-loop iter-12 Fix #53A): triggers catalyst-api StatefulSet
|
||
# restart so it picks up the new CATALYST_KC_REALM=omantel value from
|
||
# the bp-keycloak 1.5.0 mirrored Secret (realm-rename target-state).
|
||
# 1.4.127 (qa-loop iter-12 Fix #54 Workstream 4): chart-side
|
||
# templates/catalyst-gitea-token-secret.yaml + post-install Job
|
||
# auto-mints the Gitea PAT into catalyst-gitea-token (replaces
|
||
# kubectl-applied operational hack).
|
||
# 1.4.133 (qa-loop iter-1 prefetch Fix #113, prov #9 wedge):
|
||
# qa-fixtures Kyverno disallow-privileged-containers exclusion
|
||
# list now includes `catalyst` namespace so the registry-pivot
|
||
# DaemonSet shipped by bp-self-sovereign-cutover (which legitimately
|
||
# needs `securityContext.privileged: true` to rewrite
|
||
# /etc/rancher/k3s/registries.yaml on every node) is not blocked
|
||
# by the validating admission webhook. Without this, prov #9
|
||
# bp-self-sovereign-cutover HR went Ready=False and bp-catalyst-
|
||
# platform never reached Ready → console.<sov> Ingress never
|
||
# materialised → iter-1 was unrunnable.
|
||
# 1.4.134 (qa-loop iter-1 prefetch Fix #114, prov #9 unwedge):
|
||
# New pre-install hook Job (qa-finalizer-strip, weight -99)
|
||
# strips orphaned controller finalizers off Application /
|
||
# Organization / Environment / UserAccess CRs in the qa-
|
||
# namespace + force-finalizes the namespace itself if it's
|
||
# stuck Terminating. Breaks the rollback-orphan finalizer
|
||
# deadlock that left prov #9 in an unrecoverable install loop:
|
||
# 1. install creates qa-omantel ns + Application + controllers
|
||
# in same pass (no hook ordering)
|
||
# 2. qa-cnpg-backup-s3-seed post-install hook stalls 15m
|
||
# 3. cleanupOnFail rolls back, killing controllers BEFORE they
|
||
# can process Application's deletion finalizer
|
||
# 4. qa-omantel ns wedged in Terminating; no controller exists
|
||
# 5. retry: "namespace is being terminated" → seed Job RBAC
|
||
# creation rejected → 15m hook timeout → loop forever.
|
||
# This Job runs at the very start of every install attempt and
|
||
# guarantees a clean slate.
|
||
# 1.4.136 (qa-loop iter-1 Fix #124, secondary Fix #122): convert
|
||
# catalyst-gitea-token bootstrap from post-install to pre-install
|
||
# hook so catalyst-catalog + catalyst-organization-controller
|
||
# (which validate non-empty CATALYST_GITEA_TOKEN at startup) see
|
||
# a populated Secret at first container start. Prior post-install
|
||
# ordering caused chicken-and-egg deadlock: Deployments crashed
|
||
# because Secret was empty; mint Job ran AFTER Deployments,
|
||
# exponential back-off blew past Helm's 15m install timeout,
|
||
# remediation looped forever. Pre-install hook (weight=10) now
|
||
# populates the Secret (weight=5) BEFORE any consumer Deployment
|
||
# rolls. See Chart.yaml top comment for the full diagnostic chain.
|
||
# 1.4.135 (qa-loop bounded-provision-cycle Fix #119): sanitize
|
||
# illegal `/` in qa-fixtures Continuum mirror label value. Prov
|
||
# #10 wedge — helm install crashed on Continuum CR validation
|
||
# because the Fix #102 platform-mirror label
|
||
# `openova.io/continuum-mirror-of: <ns>/<name>` violates k8s
|
||
# label-value spec (`/` forbidden in values, allowed only in
|
||
# keys as the prefix separator). Split into two valid labels:
|
||
# `openova.io/continuum-mirror-of-namespace` +
|
||
# `openova.io/continuum-mirror-of-name`. Unblocks prov #11+.
|
||
# 1.4.138 (qa-loop iter-1 Fix #138, prov #20 wedge): converts
|
||
# qa-fixtures qa-cnpg-backup-s3-seed + qa-cnpg-status-seed Jobs
|
||
# from post-install hooks → regular release resources. Resolves
|
||
# the circular bootstrap-kit DAG (this slot 13 install hook needed
|
||
# bp-seaweedfs slot 18 to be Ready, which couldn't happen until
|
||
# this HR was Ready). bp-catalyst-platform install now completes
|
||
# in ~5 min instead of timing out at 15 min then loop-rolling back.
|
||
# 1.4.137: deploy-bot auto-bump (no template changes).
|
||
# 1.4.139 (Fix #163, 2026-05-11, MIRROR-EVERYTHING): every
|
||
# chart-hook image reference in this Blueprint uses the explicit
|
||
# harbor.openova.io/proxy-dockerhub prefix per CLAUDE.md
|
||
# inviolable rule. SBOM-auditable, no functional change.
|
||
# 1.4.140 (qa-loop Wave 27 Fix #184, prov #33 wedge, 2026-05-11):
|
||
# catalyst-gitea-token-mint pre-install hook Gitea-API wait loop
|
||
# raised from hardcoded 60×5s (300s = 5m) to values-driven knob
|
||
# (giteaWait.iterations × giteaWait.intervalSeconds, default
|
||
# 168×5 = 840s = 14m). Covers the autoscaler-hcloud cold-start
|
||
# observed on multi-region prov #33: workerCount=0 (Fix #157
|
||
# sizing) means the autoscaler must spawn a worker in fsn1/hel1
|
||
# before bp-gitea's Pod can schedule, which takes 10-15m on a
|
||
# fresh provision. Pre-Fix #184 budget (300s) always expired
|
||
# before gitea was reachable → bp-catalyst-platform installFailed
|
||
# and HR loop-rolled forever. Budget arithmetic: hook 840s + 60s
|
||
# slack ≤ HR install.timeout 900s (15m).
|
||
# 1.4.141 (qa-loop Fix #185, prov #38/#39/#41 recurrence,
|
||
# 2026-05-12): qa-finalizer-strip pre-install hook (helm.sh/hook-
|
||
# weight -99) now tolerates the control-plane NoSchedule taint
|
||
# and runs with priorityClassName: system-cluster-critical so it
|
||
# is ALWAYS schedulable regardless of worker-node CPU saturation.
|
||
# Root cause on prov #41: after bootstrap-kit fan-out the worker
|
||
# (cpx32, 8vCPU/16GB) sat at 99% CPU requests; the autoscaler
|
||
# had backed off scale-up of a second worker; the Job's 50m CPU
|
||
# request couldn't be satisfied; Helm pre-install timed out at
|
||
# 15m; Flux remediated 3× and gave up. Same recurring failure on
|
||
# prov #38, #39, #41 — all on chart pin 1.4.140 which (correctly)
|
||
# had no scheduling concession for the -99 hook. Image switched
|
||
# from bitnamilegacy/kubectl:1.29.3 → alpine/k8s:1.31.4 in same
|
||
# commit (rule-17 MIRROR-EVERYTHING hygiene; bitnamilegacy is
|
||
# the Docker-Hub redirect for deprecated Bitnami 2025-08 cutover).
|
||
# 1.4.147 (D31 wordpress-tenant activeHotStandby + D21 owner auto-seed):
|
||
# - PR #1562 wires bp-cnpg-pair Primary+Replica pattern into
|
||
# wordpress-tenant chart via pg.activeHotStandby knob
|
||
# - PR #1564 baked into catalyst-api:8d2a947 — handover now
|
||
# auto-seeds the operator's UserAccess CR (D21 zero-touch)
|
||
# 1.4.146 (D29 billing internal JWT bypass for public routes):
|
||
# - PR #1561 mirrors PR #1559's gateway public routes in the billing
|
||
# service's own JWT middleware. Without this, the gateway passed
|
||
# through but billing still 401-d.
|
||
# 1.4.145 (D29 gateway public routes for redeem flow):
|
||
# - PR #1559 makes /api/billing/{vouchers/redeem-preview,plans,addons}
|
||
# public so the marketplace /redeem?code=XXX landing can validate
|
||
# codes without auth (the entire D29 voucher-redeem zero-touch
|
||
# flow is broken without this)
|
||
# 1.4.144 (D27 admin tag override + D28 voucher email wire):
|
||
# - PR #1557 decouples admin tag from smeTag bundle (admin image
|
||
# may not publish for every SME services CI SHA — caught t132
|
||
# 2026-05-16 with admin:b0ed216 stuck in ImagePullBackOff)
|
||
# - PR #1556 adds the billing→notification wire so the voucher
|
||
# issuance flow emails the recipient (D28 zero-touch contract)
|
||
#
|
||
# 1.4.148 (D16 + D17 + D27 founder-flagged bug fixes, t139 verify cycle):
|
||
# - PR #1583: D16 /cloud nodes multi-cluster fan-out + handover
|
||
# export retry/reorder/auth-bypass (catalyst-api 2ab8a0e)
|
||
# - PR #1584: D27 catalog fresh-seed Published=true default
|
||
# (sme services catalog 964dc15)
|
||
# - PR #1585: D17 /app/$componentId route-collision fix (catalyst-ui 2ab8a0e)
|
||
# Caught on t136/t138 fresh-prov runs that bootstrap-kit was
|
||
# still pinned to 1.4.147 → none of the fixes reached the chroot.
|
||
# 1.4.153 — D17 Wave-1 Family A: /cloud?view=list&kind=<X>
|
||
# no longer drifts to /dashboard (kind-alias map in
|
||
# router.tsx validateSearch). Caught on t10.omantel.biz
|
||
# test agents E/C2 2026-05-17.
|
||
# 1.4.155 — Wave 5 UX polish (founder review 2026-05-17):
|
||
# - Sidebar reorder: Dashboard → Cloud → Apps → Jobs → Users →
|
||
# BSS → Settings (operator mental model: overview → infra →
|
||
# workloads → ops → access → commerce → config).
|
||
# - BSS icon swapped from bespoke receipt glyph to briefcase
|
||
# line-glyph matching the rest of the icon family.
|
||
# - Marketplace toggle moved off Settings sub-nav + standalone
|
||
# /settings/marketplace page INTO SettingsPage as a
|
||
# <SectionCard id="marketplace"> anchor section (same pattern
|
||
# as #dns, #sovereign, #notifications). MarketplaceSettings.tsx
|
||
# page deleted; MarketplaceSection.tsx new inner component;
|
||
# /settings/marketplace route + sidebar sub-nav child removed.
|
||
# Old URL now 404s — operators click Settings then scroll to
|
||
# the Marketplace anchor.
|
||
# - Save flow UNCHANGED: POST /api/v1/sovereigns/{id}/marketplace
|
||
# still commits per-Sovereign overlay to GitOps repo, Flux
|
||
# reconciles ~1 min.
|
||
#
|
||
# 1.4.154 — Wave 2 collector PR. Bundles 6 Fix-Author PRs that
|
||
# landed AFTER the 1.4.153 Wave-1 roll, all from the same t10
|
||
# test sweep:
|
||
# - #1598 Family F: BSS menu in Sovereign Console
|
||
# (Billing/Orders/Revenue/Vouchers/Tenants iframe-embed of
|
||
# marketplace.<fqdn>/back-office/*). Founder bug #1.
|
||
# - #1599 Family D: dashboard treemap fan-out for cluster /
|
||
# region / vcluster / family + Layer-1 cluster default.
|
||
# Founder bug #2.
|
||
# - #1600 Family C: ResourceDetailPage real-data rewrite —
|
||
# per-kind summary, owner chain, navigate (not assign).
|
||
# Founder bug #5.
|
||
# - #1601 Family G: 6 singletons — hcloud-volumes StorageClass
|
||
# (C9-006), /fleet/applications aggregator (C10-002),
|
||
# secondary install-* Job bridge backfill (C10-003), legacy
|
||
# wildcard-tls cert cleanup (C7-007), D22 settings em-dash
|
||
# placeholder lift (C8-001), /jobs region filter (C8-005).
|
||
# - #1602 Family E: Compliance UI — Falco runtime alerts +
|
||
# SBOM/CVE tab + framework filter chip strip + policy
|
||
# drilldown live-cluster fallback + PolicyReport /
|
||
# ClusterPolicyReport list kinds (C11-003/005/006/007/008/
|
||
# 009/010).
|
||
# - #1603 Family B: AppDetail HR-overlay status sync +
|
||
# Resources/Logs tab namespace+label fix (HR.spec.target-
|
||
# Namespace + chart-name label) + "Bootstrap blueprint"
|
||
# chip for bp-* (founder bug #4, C4-003/004/005/007/013).
|
||
# 1.4.163 (Wave 16 collector, 2026-05-18): republishes the chart
|
||
# OCI artifact so it actually contains every chart-template change
|
||
# merged after the 1.4.162 publish (commit 0ad78790). Without the
|
||
# republish, bootstrap-kit pin 1.4.162 pulls an artifact missing
|
||
# the new templates and Sovereigns boot with stale chart bytes.
|
||
# Baked: #1644 tenantPublic HTTPRoute reconciler + #1650
|
||
# tenantPublic setter on product-install + #1640 Cilium Gateway
|
||
# per-zone listener pairs + #1654 bp-newapi attestation gate +
|
||
# sandbox-controller post-handover refinements (D31 HS env vars,
|
||
# sovereign-fqdn ConfigMap keys, cutover-driver sandboxes RBAC,
|
||
# values.yaml sovereign.{enableHotStandby,primaryRegion,
|
||
# replicaRegion} defaults). See Chart.yaml header comment for
|
||
# the full change list.
|
||
# 1.4.166 (TBD-E8 / C4-015, 2026-05-18): seed 13 baseline Blueprint
|
||
# CRs unconditionally so `/api/v1/catalog` returns a non-empty
|
||
# items[] from handover-time. Pre-fix every fresh Sovereign had
|
||
# empty catalog because (a) self-sovereign-cutover step-01 only
|
||
# mirrors `openova-io/openova` into Gitea — not the `catalog` /
|
||
# `catalog-sovereign` Orgs that catalyst-catalog reads from — and
|
||
# (b) qa-fixtures (the only chart-shipped Blueprint CRs) defaults
|
||
# OFF on production. Adds templates/catalog-seed/blueprints.yaml
|
||
# (bp-wordpress-tenant, bp-cnpg, bp-keycloak, bp-grafana,
|
||
# bp-prometheus, bp-loki, bp-redis, bp-clickhouse, bp-opensearch,
|
||
# bp-temporal, bp-n8n, bp-langfuse, bp-llm-gateway) which the
|
||
# chained catalog client surfaces via in-cluster LIST fallback.
|
||
# 1.4.168 (TBD-C18b, 2026-05-18): stop clobbering the cutover-minted
|
||
# Gitea API token. templates/sme-services/provisioning-github-
|
||
# token.yaml gains a lookup-persistence guard — if the destination
|
||
# Secret carries annotation `catalyst.openova.io/token-source:
|
||
# self-sovereign-cutover-step-09` (stamped by Step 09 of bp-self-
|
||
# sovereign-cutover when it mints the real Gitea API token), the
|
||
# template preserves the existing GITHUB_TOKEN bytes instead of
|
||
# mirroring gitea-admin-secret.password over them on every Flux
|
||
# reconcile. Pre-fix on t22: Step 09 minted a real token at
|
||
# 13:43:33Z; ~5 min later helm reconcile rewrote GITHUB_TOKEN back
|
||
# to the admin password byte, so every subsequent SME provisioning
|
||
# call to Gitea returned 401 "user does not exist" and journey
|
||
# step 16 (tenant repo creation) silently stuck.
|
||
# 1.4.179 (TBD-A14/A15/A10b, 2026-05-18): three t24 zero-touch
|
||
# Wave 36 P1 fresh-prov blockers — see chart Chart.yaml header for
|
||
# the full diagnostic + fix description per gate.
|
||
# - A14 issue #1843: networkpolicies (networking.k8s.io) RBAC
|
||
# get/list/watch verbs added to clusterrole-cutover-driver.
|
||
# - A15 issue #1844: sovereign-fqdn ConfigMap empty fields
|
||
# populated end-to-end via the cloud-init → bootstrap-kit →
|
||
# chart substitute chain (configuredRegions / controlPlaneIP /
|
||
# primaryRegion / replicaRegion / selfDeploymentId /
|
||
# enableHotStandby / qaApplications). This Kustomization gains
|
||
# 3 new value mappings: global.sovereignSelfDeploymentId,
|
||
# sovereign.configuredRegions, sovereign.qaApplications.
|
||
# - A10b issue #1845: GET kubeconfig?region=<cloudRegion>
|
||
# resolves the slot-suffixed on-disk shape
|
||
# `<id>-<region>-<i>.yaml` (handler-side glob fallback).
|
||
# 1.4.181 (catch-up for Blueprint Release workflow outage,
|
||
# 2026-05-18 21:04Z → 22:07Z): chart published 1.4.180 → 1.4.181
|
||
# during the YAML scanner break introduced by PR #1858 and fixed
|
||
# by PR #1866. Auto-bump-pin step didn't fire during the outage,
|
||
# so this pin lagged by 2 versions. Refs #1864.
|
||
version: 1.4.189
|
||
sourceRef:
|
||
kind: HelmRepository
|
||
name: bp-catalyst-platform
|
||
namespace: flux-system
|
||
# Event-driven install: umbrella chart deploys ~10 Catalyst services
|
||
# (console, marketplace, admin, catalog-svc, projector, provisioning,
|
||
# environment-controller, blueprint-controller, billing). Inter-service
|
||
# readiness via OTel/NATS subjects is multi-minute and not Helm's
|
||
# concern. Replaces PR #221 spec.timeout: 15m.
|
||
#
|
||
# Issue #910 (otech105 incident, 2026-05-05): 15m was too tight for
|
||
# bp-catalyst-platform on a fresh franchised Sovereign with the full
|
||
# SME service stack (sme-services + tenant-orchestration + post-install
|
||
# secret mirror Jobs). The chart genuinely needs ~20 minutes worst
|
||
# case before remediation.retries kicks in. Bumped to 25m
|
||
# specifically for this umbrella chart — every other bp-* chart
|
||
# remains at its previous (or default) timeout because they install
|
||
# in well under 5 minutes empirically.
|
||
#
|
||
# chart-roll-rca iter-15 (2026-05-10): timeout reduced 25m → 15m and
|
||
# remediation hardened with cleanupOnFail + strategy: rollback +
|
||
# remediateLastFailure. Background: the 25m ceiling existed to absorb
|
||
# the dep-ordering race RC-1 (qa-fixtures UserAccess CRs rendering
|
||
# before the bp-crossplane-claims XRD existed). With that race fixed
|
||
# via the bp-crossplane-claims dependsOn edge above, 15m is plenty for
|
||
# the umbrella's true install latency on a healthy cluster.
|
||
# cleanupOnFail purges partial release artifacts on retry; rollback
|
||
# strategy reverts to the last good release before retrying instead of
|
||
# leaving the release Secret pinned at `pending-upgrade` for the full
|
||
# timeout ceiling. Net effect: a failed-then-recoverable upgrade
|
||
# collapses from ~75m worst case → ~15m worst case.
|
||
#
|
||
# post-prov7 fix (2026-05-10, refs chart-roll-rca-iter15): the
|
||
# HelmRelease v2 schema only allows `cleanupOnFail` and
|
||
# `remediation.strategy` on the `upgrade` block. The previous version
|
||
# of this file placed both fields on the `install` block as well,
|
||
# which caused the bootstrap-kit Kustomization to fail dry-run on a
|
||
# fresh Sovereign with `field not declared in schema`, blocking ALL
|
||
# HRs from rendering. The install block here keeps only the schema-
|
||
# legal fields (`retries`, `remediateLastFailure`); rollback semantics
|
||
# apply naturally to upgrades, and a failed first install is
|
||
# remediated via retry without rollback (no prior release to roll
|
||
# back to).
|
||
#
|
||
# F8 fix (2026-05-12, prov #44 RCA): bumped install + upgrade timeout
|
||
# 15m → 30m. F1-F7 ship live on main, qa-finalizer-strip Completed
|
||
# and autoscaler workers joined, but bp-catalyst-platform HR was
|
||
# still mid-retry (failures=3) at the catalyst-api 60m phase1 watch
|
||
# cap on d9399223c3caa4f9. Total bootstrap-kit install on a fresh
|
||
# cpx42×1 Sovereign genuinely exceeds the 15m PR #221 ceiling when
|
||
# the umbrella chart's full SME + Catalyst service stack rolls
|
||
# without a warm Harbor proxy-cache. Paired with the F8 catalyst-api
|
||
# DefaultWatchTimeout bump (60m → 120m) so the outer watch budget
|
||
# comfortably contains the new 30m × 3-retry inner HR ceiling.
|
||
install:
|
||
disableWait: true
|
||
timeout: 30m
|
||
remediation:
|
||
retries: 3
|
||
remediateLastFailure: true
|
||
upgrade:
|
||
disableWait: true
|
||
timeout: 30m
|
||
remediation:
|
||
retries: 3
|
||
strategy: rollback
|
||
remediateLastFailure: true
|
||
cleanupOnFail: true
|
||
# Per-Sovereign overrides for the umbrella — sovereign-FQDN-derived hostnames
|
||
# for console/admin/api. All chart-level Catalyst service config (image refs,
|
||
# OTel endpoints, NATS subjects) lives in products/catalyst/chart/values.yaml.
|
||
values:
|
||
global:
|
||
sovereignFQDN: ${SOVEREIGN_FQDN}
|
||
# sovereignLBIP — Sovereign's load-balancer public IPv4. Issue #900:
|
||
# the Day-2 multi-domain add-domain flow uses this to pre-register
|
||
# glue records at the customer's registrar before flipping NS.
|
||
# Resolved via envsubst from `SOVEREIGN_LB_IP` set in the Sovereign
|
||
# cloud-init env (rendered into bootstrap-kit by infra/hetzner from
|
||
# hcloud_load_balancer.main.ipv4 — see infra/hetzner/main.tf:274).
|
||
# When the Sovereign cloud-init pre-dates #900 the env stays empty
|
||
# and the chart renders an empty `lbIP` ConfigMap key — catalyst-api
|
||
# then short-circuits the glue registration and falls back to plain
|
||
# set_ns (legacy behaviour).
|
||
sovereignLBIP: ${SOVEREIGN_LB_IP}
|
||
# sovereignSelfDeploymentId — the catalyst-api deployment-record id
|
||
# this Sovereign was provisioned under on the contabo mothership.
|
||
# Threaded from cloud-init's SOVEREIGN_DEPLOYMENT_ID Kustomization
|
||
# postBuild substitute. Consumed by the chart's sovereign-fqdn
|
||
# ConfigMap `selfDeploymentId` key so the chroot catalyst-api's
|
||
# GET /api/v1/sovereign/self answers with the correct id at
|
||
# handover-time (no wait for the orchestrator's chart-values
|
||
# overlay write). TBD-A15 (t24 zero-touch, 2026-05-18, issue #1844).
|
||
sovereignSelfDeploymentId: '${SOVEREIGN_DEPLOYMENT_ID:-}'
|
||
ingress:
|
||
hosts:
|
||
console:
|
||
host: console.${SOVEREIGN_FQDN}
|
||
admin:
|
||
host: admin.${SOVEREIGN_FQDN}
|
||
marketplace:
|
||
host: marketplace.${SOVEREIGN_FQDN}
|
||
api:
|
||
host: api.${SOVEREIGN_FQDN}
|
||
# Marketplace mode (issue #710). Toggle to true via envsubst
|
||
# MARKETPLACE_ENABLED in the per-Sovereign overlay (catalyst-api
|
||
# writes this when the wizard's "Enable Marketplace" component is
|
||
# checked). When true, bp-catalyst-platform 1.3.0+ renders the
|
||
# marketplace + tenant-wildcard HTTPRoutes and the cross-namespace
|
||
# ReferenceGrant.
|
||
marketplace:
|
||
enabled: ${MARKETPLACE_ENABLED:-false}
|
||
# ─── Multi-zone parent domains (issue #827, parent epic #825) ──────
|
||
# One wildcard Certificate per parent zone, rendered by chart 1.4.0+
|
||
# into kube-system. Each cert renews independently; a stalled
|
||
# DNS-01 challenge on one zone never blocks another zone's renewal.
|
||
# Source of truth is the same ${PARENT_DOMAINS_YAML} variable used
|
||
# by bootstrap-kit slot 11 (bp-powerdns) so the two slots stay in
|
||
# lockstep on what the Sovereign considers a parent zone.
|
||
# When the operator brings only one parent domain (default
|
||
# zero-touch flow), cloud-init pre-renders this variable to a
|
||
# single-entry array derived from ${sovereign_fqdn}.
|
||
parentZones: ${PARENT_DOMAINS_YAML}
|
||
# ─── Wildcard cert issuer environment (Fix #123, LE rate-limit) ────
|
||
# Default-OFF (production LE issuer); flipped to true via envsubst
|
||
# WILDCARD_CERT_USE_STAGING=true on the per-Sovereign overlay for any
|
||
# Sovereign that should issue staging-LE certs instead of production.
|
||
# The qa-loop coordinator pairs this knob with QA_FIXTURES_ENABLED on
|
||
# QA Sovereigns (omantel.biz and qa.* pools) so the wipe + re-provision
|
||
# cadence never trips Let's Encrypt's 5-certs/168h production ceiling
|
||
# per registered domain. Customer Sovereigns leave this empty (=false)
|
||
# and get real-trusted production certs.
|
||
#
|
||
# Staging certs are signed by Fake LE Intermediate X1; browsers
|
||
# reject without an explicit exception, but `curl -sk` and Playwright
|
||
# (ignoreHTTPSErrors:true) accept them — sufficient for the qa-loop
|
||
# Test Executor's contract assertions.
|
||
#
|
||
# Per docs/INVIOLABLE-PRINCIPLES.md #4 every Sovereign may flip this
|
||
# independently; the chart values.yaml carries the staging issuer
|
||
# name (`letsencrypt-dns01-staging-powerdns`, shipped by
|
||
# bp-cert-manager-powerdns-webhook 1.1.0+) as an overridable default.
|
||
wildcardCert:
|
||
useStaging: ${WILDCARD_CERT_USE_STAGING:-false}
|
||
# ─── Sovereign-side region seeding (DoD D5) ─────────────────────
|
||
# regionsJson — JSON-array literal of the canonical multi-region
|
||
# RegionSpec[] this Sovereign was provisioned with. Threaded
|
||
# through from the mothership prov body via the tofu cloud-init
|
||
# `SOVEREIGN_REGIONS_JSON` envsubst placeholder. The chart writes
|
||
# this string into the `sovereign-fqdn` ConfigMap's `regionsJson`
|
||
# key (sovereign-fqdn-configmap.yaml); the catalyst-api Pod reads
|
||
# via env `SOVEREIGN_REGIONS_JSON`; chrootEnsureDeployment parses
|
||
# and stamps Request.Regions so /infrastructure/topology emits
|
||
# the right per-region tree and /cloud?view=graph renders all
|
||
# N regions correctly. Without this the chroot fell back to the
|
||
# live-Nodes path and emitted "1 cluster 1 region" on every
|
||
# multi-region Sovereign (caught on t126, 2026-05-16).
|
||
sovereign:
|
||
# MUST be quoted: SOVEREIGN_REGIONS_JSON contains valid JSON like
|
||
# `[{"cloudRegion":"hel1",...}]`. Without quotes, YAML interprets
|
||
# the JSON as a YAML flow-sequence-of-flow-mappings, parses into
|
||
# `[]map[string]interface{}`, then Helm's chart template `{{ .Values.
|
||
# sovereign.regionsJson }}` stringifies via Go's `%v` printf —
|
||
# producing `[map[cloudRegion:hel1 ...]]` (Go map syntax, NOT JSON).
|
||
# The chroot's chrootRegionsFromEnv then can't json.Unmarshal it →
|
||
# falls back to live-Nodes path → /cloud renders "1 region 1 cluster"
|
||
# on every multi-region Sovereign. Caught on t131 2026-05-16.
|
||
# Single-quoted so embedded double-quotes in the JSON are literal.
|
||
regionsJson: '${SOVEREIGN_REGIONS_JSON:-}'
|
||
# ─── D22 (settings empty values) sovereign-side identity ──────────
|
||
# ORG_EMAIL / ORG_NAME / SOVEREIGN_CONTROL_PLANE_IP / GITOPS_REPO_URL
|
||
# threaded from cloud-init (provisioner.go::writeTfvars + Hetzner
|
||
# tofu cloudinit-control-plane.tftpl). Chart's sovereign-fqdn
|
||
# ConfigMap exposes these as keys; catalyst-api reads via env in
|
||
# api-deployment.yaml (PR #1569); chrootEnsureDeployment populates
|
||
# the deployment record so Sovereign Console Settings page renders
|
||
# real ownerEmail/region/controlPlaneIP/gitopsRepoURL/consoleURL
|
||
# instead of `—` placeholders. Empty default = same as today,
|
||
# backwards-compatible for charts that don't have the cloud-init
|
||
# placeholders wired yet.
|
||
orgEmail: '${ORG_EMAIL:-}'
|
||
orgName: '${ORG_NAME:-}'
|
||
controlPlaneIP: '${SOVEREIGN_CONTROL_PLANE_IP:-}'
|
||
gitopsRepoURL: '${GITOPS_REPO_URL:-}'
|
||
# ─── D31 active-hot-standby (cross-region CNPG) ──────────────────
|
||
# Sovereign-level opt-in for the active-hot-standby Postgres shape
|
||
# on every CNPG-backed tenant app the marketplace installs.
|
||
# Default-OFF — every Sovereign that has not flipped
|
||
# SOVEREIGN_ENABLE_HOT_STANDBY=true on the per-Sovereign overlay
|
||
# keeps rendering single-Cluster CNPG (no regression). When ON
|
||
# AND both region keys are non-empty AND distinct, the SME-tenant
|
||
# gitops writer injects pg.activeHotStandby.* into every fresh
|
||
# bp-wordpress-tenant HelmRelease so the chart's
|
||
# cnpg-cluster.yaml template renders a primary + replica
|
||
# Cluster.postgresql.cnpg.io pair across the two regions, WAL
|
||
# streaming over Cilium ClusterMesh (DoD D11 + D31). Same wiring
|
||
# extends to any future tenant product chart (gitlab-tenant,
|
||
# nextcloud-tenant) that adopts the same value contract.
|
||
#
|
||
# Region keys MUST match the canonical openova.io/region node
|
||
# label value (e.g. `hz-fsn-rtz-prod`, `hz-hel-rtz-prod`) — the
|
||
# WordPress chart's cnpg-cluster.yaml uses nodeAffinity on that
|
||
# label to pin the primary + replica Pods to the right regions.
|
||
enableHotStandby: '${SOVEREIGN_ENABLE_HOT_STANDBY:-}'
|
||
primaryRegion: '${SOVEREIGN_PRIMARY_REGION:-}'
|
||
replicaRegion: '${SOVEREIGN_REPLICA_REGION:-}'
|
||
# configuredRegions — YAML list of region keys this Sovereign was
|
||
# provisioned with (e.g. ["fsn1", "hel1"]). Threaded from cloud-init's
|
||
# SOVEREIGN_CONFIGURED_REGIONS_YAML Kustomization postBuild substitute
|
||
# which the tofu module renders as a YAML inline list literal from
|
||
# var.regions[*].cloudRegion. The chart's sovereign-fqdn ConfigMap
|
||
# joins this list into a comma-separated `configuredRegions` key for
|
||
# the catalyst-ui Dashboard SovereignCard + Networking → ClusterMesh
|
||
# tab to render configured-but-not-active chips. Defaults to empty
|
||
# list so non-multi-region Sovereigns surface only their live region.
|
||
# TBD-A15 (t24 zero-touch, 2026-05-18, issue #1844).
|
||
configuredRegions: ${SOVEREIGN_CONFIGURED_REGIONS_YAML:-[]}
|
||
# qaApplications — YAML list of qa-fixtures applicationRef literals
|
||
# the chroot Sovereign's /compliance/scorecard surface emits via
|
||
# appRefs[]. Default empty so production Sovereigns surface only
|
||
# PolicyReport-observed apps. QA Sovereigns set via QA_APPLICATIONS_YAML.
|
||
# TBD-A15 (t24 zero-touch, 2026-05-18, issue #1844).
|
||
qaApplications: ${QA_APPLICATIONS_YAML:-[]}
|
||
# ─── QA fixtures (qa-loop iter-6 Cluster-F + EPIC-6 iter-6) ────────
|
||
# Default-OFF on production; flipped to true via envsubst
|
||
# QA_FIXTURES_ENABLED=true on the per-Sovereign overlay for any
|
||
# Sovereign that participates in qa-loop matrix testing. Renders
|
||
# the 8-resource fixture stack (qa-omantel ns + qa-wp Application +
|
||
# cont-omantel Continuum CR + qa-cnpg CNPGPair + pdm-1/2/3 PDM CRs +
|
||
# ScheduledBackup + status seeder Jobs) the matrix asserts on. See
|
||
# products/catalyst/chart/templates/qa-fixtures/_README.txt.
|
||
qaFixtures:
|
||
enabled: ${QA_FIXTURES_ENABLED:-false}
|
||
# qa-loop iter-11 Cluster-A: tier-scoped test-session minting.
|
||
# Enables POST /api/v1/auth/test-session?tier=<viewer|...|owner>
|
||
# in catalyst-api so the 5-agent QA executor can mint per-tier
|
||
# session cookies and assert the matrix's tier-boundary 403/200
|
||
# contracts on every privileged endpoint. Default true on the
|
||
# bootstrap-kit because every Sovereign that has qaFixtures.enabled
|
||
# is by definition a QA Sovereign — keeping the second knob to
|
||
# off-by-default would force a per-Sovereign override on the same
|
||
# axis the first knob already gates. Production Sovereigns (qaFixtures.enabled=false)
|
||
# don't render the seeder UserAccess CRs so the endpoint has no
|
||
# tier-bound users to authenticate against — wire-safe by
|
||
# construction. Override to "false" to disable the endpoint on
|
||
# an otherwise-QA Sovereign that wants only the resource fixtures.
|
||
testSessionEnabled: ${QA_TEST_SESSION_ENABLED:-true}
|
||
namespace: ${QA_FIXTURES_NAMESPACE:-qa-omantel}
|
||
appName: ${QA_FIXTURES_APP:-qa-wp}
|
||
# Sovereign FQDN for Organization.spec.sovereignRef. CRD validation
|
||
# `^[a-z0-9]([a-z0-9-]*[a-z0-9])?(\.[a-z0-9]([a-z0-9-]*[a-z0-9])?)+$`
|
||
# requires a dotted FQDN (single label "omantel" rejected). Defaults
|
||
# to ${SOVEREIGN_FQDN} from the Kustomization postBuild substitute
|
||
# so every Sovereign gets its own correct FQDN automatically.
|
||
sovereignRef: ${SOVEREIGN_FQDN:-omantel.biz}
|
||
organization: ${QA_ORGANIZATION:-omantel-platform}
|
||
continuumName: ${QA_CONTINUUM_NAME:-cont-omantel}
|
||
# Default embeds "cnpgpair" substring so the matrix's
|
||
# `kubectl get cnpgpair -n qa-omantel` stdout (TC-306 must_contain
|
||
# ["cnpgpair", "fsn1", "hz-hel-rtz-prod"]) round-trips against the
|
||
# rendered NAME column. Pre-Fix #40 the default `qa-cnpg` produced
|
||
# a NAME column missing the "pair" substring (Fix #40 follow-up).
|
||
cnpgPairName: ${QA_CNPGPAIR_NAME:-qa-cnpgpair}
|
||
# 4-segment canonical region label per Application + Environment
|
||
# CRD validation `^[a-z]+-[a-z]+-[a-z]+-[a-z]+$`. Legacy "fsn1"
|
||
# rejected at admission and pinned omantel on the prior image SHA
|
||
# (Fix #38 follow-up — caught after chart 1.4.105 still failed
|
||
# because the bootstrap-kit's release-config override beat the
|
||
# chart values.yaml default).
|
||
primaryRegion: ${QA_PRIMARY_REGION:-hz-fsn-rtz-prod}
|
||
standbyRegion: ${QA_STANDBY_REGION:-hz-hel-rtz-prod}
|
||
# CNPGPair short-form region labels — distinct seam from the
|
||
# canonical 4-segment Application/Environment/Continuum regions
|
||
# because the CNPGPair CRD validates against the more permissive
|
||
# `^[a-z0-9]+(-[a-z0-9]+)*$` and the cnpg-pair-controller's CCM
|
||
# zone-affinity convention uses the Hetzner short form (`fsn1`,
|
||
# `hel1`). The two seams stay in lockstep via the qa-fixtures
|
||
# node-labels-seeder Job that patches every node with
|
||
# topology.kubernetes.io/region=<short> derived from the existing
|
||
# openova.io/region=<canonical> label (Fix #40 Cluster-B).
|
||
cnpgPairPrimaryRegion: ${QA_CNPGPAIR_PRIMARY_REGION:-fsn1}
|
||
cnpgPairReplicaRegion: ${QA_CNPGPAIR_REPLICA_REGION:-hz-hel-rtz-prod}
|
||
pdmZone: ${QA_PDM_ZONE:-openova.io}
|
||
# qaFixtures.sovereignFQDN — explicit FQDN override consumed by
|
||
# templates/qa-fixtures/organization-omantel-platform.yaml's
|
||
# resolution chain (qaFixtures.sovereignFQDN →
|
||
# global.sovereignFQDN → qaFixtures.sovereignRef-if-FQDN →
|
||
# "omantel.biz"). Defaults to the Sovereign-wide FQDN so an
|
||
# operator never has to set it explicitly. Distinct from the
|
||
# qaFixtures.sovereignRef knob (line 393) which is now FQDN-form
|
||
# too via #1244 — kept as a backup signal in case a future
|
||
# refactor splits the seams again. (Fix #40 Cluster-A.)
|
||
sovereignFQDN: ${SOVEREIGN_FQDN:-}
|
||
# CNPG Cluster CR fixtures (Fix #37) — single-region by default;
|
||
# multi-region drill is owned by Continuum DR controllers + the
|
||
# cnpg-pair-controller. Override the *Region knobs once cross-
|
||
# region NodePort filtering is resolved (incidents.md §"Hetzner
|
||
# cross-region NodePort 32379 filtered").
|
||
cnpgPrimaryClusterName: ${QA_CNPG_PRIMARY_CLUSTER:-cluster-primary}
|
||
cnpgReplicaClusterName: ${QA_CNPG_REPLICA_CLUSTER:-cluster-replica}
|
||
cnpgPrimaryRegion: ${QA_CNPG_PRIMARY_REGION:-hz-fsn-rtz-prod}
|
||
cnpgReplicaRegion: ${QA_CNPG_REPLICA_REGION:-hz-fsn-rtz-prod}
|
||
cnpgImage: ${QA_CNPG_IMAGE:-ghcr.io/cloudnative-pg/postgresql:16.4-1}
|
||
cnpgStorageClass: ${QA_CNPG_STORAGE_CLASS:-local-path}
|
||
cnpgStorageSize: ${QA_CNPG_STORAGE_SIZE:-1Gi}
|
||
# Kyverno baseline policies (Fix #37). disallow-privileged-containers
|
||
# ships in Enforce mode; the other 18 baseline policies in Audit so
|
||
# the matrix sees ClusterPolicyReports without blocking platform
|
||
# pods. Soft-launch by setting Audit on a fresh Sovereign.
|
||
kyvernoEnforceMode: ${QA_KYVERNO_ENFORCE_MODE:-Enforce}
|