* fix(bp-external-dns): grant apiserver egress via CiliumNetworkPolicy (closes#770)
Root cause: ExternalDNS crashloops on every fresh Sovereign provision
with `failed to sync *v1.Endpoints: context deadline exceeded`. The
companion vanilla NetworkPolicy egress rule
`to: ipBlock: 0.0.0.0/0 ports: 443,6443` does NOT match traffic to the
kube-apiserver under Cilium with the default `policy-cidr-match-mode: ""`.
Cilium models the apiserver as a reserved identity, not a CIDR range,
so the ipBlock rule is bypassed and the apiserver call is dropped at
the egress hook of the external-dns endpoint.
Fix: render a companion CiliumNetworkPolicy with
`toEntities: [kube-apiserver]` scoped to the external-dns Pod selector.
This is the canonical Cilium pattern for controllers that watch the
apiserver. The existing vanilla NetworkPolicy is preserved verbatim so
the Blueprint remains CNI-agnostic per BLUEPRINT-AUTHORING.md.
Live proof on otech93 (2026-05-04): manually applied the rendered CNP
to the running cluster, external-dns transitioned from CrashLoopBackOff
(8 restarts in 20m) to 1/1 Running within 30s, informer cache sync
completed cleanly.
Bumps bp-external-dns 1.1.6 → 1.1.7.
Why not `policy-cidr-match-mode: nodes` cluster-wide on bp-cilium? It
silently relaxes EVERY other NetworkPolicy that uses 0.0.0.0/0 in the
cluster — too broad. Per INVIOLABLE-PRINCIPLES the fix MUST be scoped
to the workload that needs it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(_template): bump bp-external-dns 1.1.6 → 1.1.7 to pick up CNP fix
Pairs with the chart bump in the same PR. Every fresh otech provision
hydrates clusters/_template/, so this pin is what determines the
version installed. Without bumping here, otech94+ would still use
1.1.6 and continue to crashloop with the apiserver-egress symptom.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: hatiyildiz <hatice@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #679 added --request-timeout=120s but external-dns has TWO timeouts:
RequestTimeout (per-API-call, controlled by --request-timeout) and
WaitForCacheSync (initial informer sync, hardcoded 60s in upstream binary,
NOT exposed as a flag). On a fresh Sovereign with k3s apiserver
CPU-saturated, the cache sync misses 60s -> fatal: failed to sync
*v1.Node: context deadline exceeded -> CrashLoopBackOff 5-10 times.
Caught live on otech49+ (2026-05-03), 5 restarts before stable.
Bump livenessProbe.initialDelaySeconds from upstream 10s default to 180s
so kubelet does NOT restart the Pod while the initial cache sync runs
against a CPU-saturated freshly-provisioned k3s apiserver. The Sovereign
apiserver reaches steady-state within ~2 min so 3 min comfortably covers
cold starts. Also bumps periodSeconds=30 + failureThreshold=3 so a
genuinely-hung pod is still killed within ~90s once steady-state.
readinessProbe gets a corresponding initialDelaySeconds=30 so endpoint
flapping during sync doesn't churn services.
Helm overrides REPLACE whole maps (not merge), so the override preserves
the upstream httpGet.path: /healthz + port: http shape verbatim.
Bumps:
- platform/external-dns/chart/Chart.yaml: 1.1.5 -> 1.1.6
- clusters/_template/bootstrap-kit/12-external-dns.yaml: HelmRelease pin 1.1.5 -> 1.1.6
Co-authored-by: hatiyildiz <hatice@openova.io>
Caught live on otech43–46: external-dns crashloops 10+ times on fresh
Sovereign before initial *v1.Pod sync completes. Default 30s timeout
insufficient when k3s apiserver is CPU-saturated.
Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(bp-external-dns): remove --pdns-api-version flag — unknown in v0.15.1 (Closes#587)
The native pdns provider in external-dns v0.15.1 does not accept
--pdns-api-version; the binary fatals at startup with:
'unknown long flag --pdns-api-version'
causing CrashLoopBackOff (53+ restarts on otech22).
The provider auto-negotiates the PowerDNS API version — the flag is
superfluous and broken. Remove it from extraArgs.
Bump bp-external-dns 1.1.3 → 1.1.4. Bootstrap-kit slots updated for
_template, otech.omani.works, omantel.omani.works.
Co-Authored-By: alierenbaysal <alierenbaysal@openova.io>
* fix(ci): add bp-reflector slot 5a + bp-external-dns dep to expected-bootstrap-deps.yaml
The dependency-graph-audit check was failing because:
1. 05a-reflector.yaml exists in clusters/_template/bootstrap-kit/ but
bp-reflector was not declared in scripts/expected-bootstrap-deps.yaml
2. bp-external-dns had dependsOn=[bp-cert-manager, bp-powerdns, bp-reflector]
in the HelmRelease but expected-bootstrap-deps.yaml only declared
[bp-cert-manager, bp-powerdns]
Add bp-reflector (slot 5a, depends_on: [bp-cert-manager]) and update
bp-external-dns depends_on to include bp-reflector in the expected DAG.
Co-Authored-By: alierenbaysal <alierenbaysal@openova.io>
---------
Co-authored-by: alierenbaysal <alierenbaysal@openova.io>
bp-powerdns was moved to the `powerdns` namespace in PR #556/#553, but
bp-external-dns still had `powerdnsNamespace: openova-system` in its
NetworkPolicy egress rule and `--pdns-server=...openova-system...` in
extraArgs. Both pointed at the wrong namespace, blocking DNS reconciliation.
Fix:
- externalDns.networkPolicy.powerdnsNamespace: openova-system → powerdns
- extraArgs --pdns-server: ...openova-system... → ...powerdns...
Bump bp-external-dns 1.1.2 → 1.1.3. Bootstrap-kit slot 12 updated.
Co-authored-by: alierenbaysal <alierenbaysal@openova.io>
* fix(bp-external-dns): hide CRD-emitting resources behind Capabilities gates (refs #190)
Wrap the Catalyst overlay's ServiceMonitor and ExternalSecret templates
in `.Capabilities.APIVersions.Has` checks so a cold install on a fresh
Sovereign — where bp-kube-prometheus-stack and bp-external-secrets have
not yet reconciled — no longer fails with `no matches for kind X in
version Y`. The values toggles (`externalDns.serviceMonitor.enabled`,
`externalDns.externalSecret.enabled`) remain — Capabilities is defense
in depth so an operator flipping the toggle on a Sovereign that hasn't
reached Phase 2 doesn't break the bp-external-dns reconcile.
Verified locally: `helm template` with toggles off renders 0 of these
resources; with toggles ON and `--api-versions monitoring.coreos.com/v1
--api-versions external-secrets.io/v1beta1` both render exactly once.
Bump version 1.1.0 → 1.1.2 to align with the Phase-1 architectural-fix
wave from issue #190.
* fix(bp-powerdns): hide CRD-emitting resources behind Capabilities gates (refs #190)
Three Catalyst overlay templates emit resources whose CRDs ship in OTHER
charts and were unconditionally rendered, causing a cold install of
bp-powerdns to fail with `no matches for kind X` on a Sovereign that
hasn't yet reconciled the upstream chart:
- cnpg-cluster.yaml → postgresql.cnpg.io/v1 Cluster
(CRD ships in bp-cnpg)
- api-ingress.yaml → traefik.io/v1alpha1 Middleware
(CRD ships with the Traefik controller;
k3s ships it by default but a Sovereign
overlay MAY disable Traefik in favour
of cilium-only ingress)
- crossplane-floatingip.yaml → compose.openova.io/v1alpha1 HetznerFloatingIP
(CRD ships when the Catalyst Crossplane
composition family lands — see GAP
DISCLOSURE in that template)
Each is wrapped in `.Capabilities.APIVersions.Has "<group>/<version>"`.
The Traefik router-middleware annotation on the Ingress is similarly
gated so the auth posture cleanly moves to the Sovereign's chosen
ingress controller when Traefik is absent.
Verified locally: `helm template` with default values renders 0 of
these resources; with `--api-versions postgresql.cnpg.io/v1
--api-versions traefik.io/v1alpha1 --api-versions compose.openova.io/v1alpha1`
plus `--set crossplane.floatingIP.enabled=true`, all three render
exactly once. Existing tests/observability-toggle.sh still passes.
Bump version 1.1.1 → 1.1.2.
* fix(bp-powerdns): bump blueprint.yaml to match Chart.yaml 1.1.2 after Capabilities gate work
---------
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Convert platform/external-dns/chart/ from a metadata-only wrapper to a
proper Helm umbrella that pulls kubernetes-sigs/external-dns 1.15.2
(appVersion 0.15.1, k8s 1.31-validated) as a Helm subchart, mirroring
the bp-cilium / bp-cert-manager / bp-powerdns shape. Native PowerDNS
provider speaks the bp-powerdns REST API directly via the
EXTERNAL_DNS_PDNS_API_KEY env var sourced from the
powerdns-api-credentials Secret bp-powerdns renders.
Catalyst overlay templates added (default-off where applicable per the
observability-toggle rule for the bp-* family):
- templates/networkpolicy.yaml (default ON; egress to powerdns +
cluster DNS + apiserver only)
- templates/servicemonitor.yaml (default OFF)
- templates/externalsecret.yaml (default OFF; Phase-2 OpenBao path)
- templates/_helpers.tpl
Bootstrap-kit Kustomization gets a new 12-external-dns.yaml HelmRelease
referencing bp-external-dns:1.1.0 with dependsOn bp-cert-manager +
bp-powerdns, and the legacy 11-bp-catalyst-platform.yaml is renumbered
13- so the install ordering reads in canonical Phase-0 sequence. Mirrored
to clusters/omantel.omani.works/bootstrap-kit/ with the SOVEREIGN_FQDN
substitution applied.
bp-catalyst-platform Chart.yaml drops bp-external-dns from its
dependency block — install ordering for ExternalDNS is now owned by Flux
dependsOn at the Kustomization layer rather than this umbrella's Helm
dependency graph. Bumped 1.1.0 → 1.1.1 to reflect the dep removal, and
the bootstrap-kit HelmRelease references in both clusters bumped in
lockstep.
Wrapper chart version bumped 1.0.0 → 1.1.1 (umbrella shape).
Local gates pass:
- helm dependency build (pulls external-dns-1.15.2.tgz)
- helm lint (0 failures)
- helm template smoke render (245 lines, 6 kinds rendered)
- helm package + tar-tzf verifies external-dns subchart inside the
packaged tgz (subchart-guard simulation passes)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Live verified on omantel: bp-cilium and bp-cert-manager v1.1.0 fail Helm
install with 'no matches for kind ServiceMonitor in version
monitoring.coreos.com/v1'. Manual kubectl-patch of the live HelmRelease
worked but Flux's 15-min reconcile rolls back the patch because the
HelmRelease CR is owned by the kustomize-controller from git.
Override the values inline in the HelmRelease manifests so the patch is
durable across Flux reconciles. Same pattern as the in-flight observability-
toggle agent will apply to all 12 charts in the next chart bump (v1.1.1).
This is the manifest-level workaround that unblocks the running omantel
cluster TODAY without waiting for v1.1.1 publish.
Mirrors the patches into both clusters/_template/bootstrap-kit/ AND
clusters/omantel.omani.works/bootstrap-kit/ so future Sovereigns inherit.
The bp-catalyst-platform umbrella (issue #104) declares a dependency on
bp-external-dns:1.0.0 — but the chart didn't exist; only README + Dynadot
multi-domain policy lived under platform/external-dns/. Without this leaf
the umbrella's `helm dependency build` fails (verified in run 25068433765).
This commit authors the minimal target-state leaf:
- Chart.yaml: name=bp-external-dns, version=1.0.0
- values.yaml: catalystBlueprint.upstream metadata (external-dns 1.15.0
from kubernetes-sigs/external-dns Helm repo) + Catalyst-curated values
overlay (sources, txtOwnerId, ServiceMonitor, RBAC, resources)
Per BLUEPRINT-AUTHORING.md §3, leaf charts are pure values-overlay wrappers:
no templates dir, just Chart.yaml + values.yaml with the catalystBlueprint
metadata block read by the bootstrap-kit installer at helm-install time.
Per-Sovereign provider/zone/credential overrides are overlaid by the
Crossplane Composition that materializes the HelmRelease — keeping this
chart provider-agnostic (no hardcoded Cloudflare/Dynadot/Hetzner choice
per INVIOLABLE-PRINCIPLES.md §4).
After this lands, blueprint-release.yaml will publish
ghcr.io/openova-io/bp-external-dns:1.0.0 and the next umbrella push will
resolve all 11 leaf deps successfully.
Adds platform/external-dns/policies/dynadot-multi-domain.yaml — the
canonical external-dns + dynadot webhook deployment that ships in every
Sovereign on an OpenOva pool domain.
Why a webhook: external-dns has no upstream Dynadot provider; the
canonical pattern is the webhook RPC contract, with a sidecar that
implements the provider in our preferred language. We reuse the same
internal/dynadot/ package the catalyst-api uses, so the never-wipe rule,
record encoding, and managed-domain allowlist are identical on both
write paths (per docs/INVIOLABLE-PRINCIPLES.md #2 — no duplicate
implementations of the same concern).
Multi-domain:
- One --domain-filter per zone in the external-dns args; adding a third
pool domain (e.g. acme.io) is a one-line edit here PLUS a one-key edit
on dynadot-api-credentials' `domains` field. No webhook rebuild.
- Webhook reads DYNADOT_MANAGED_DOMAINS from the same secret with
optional=true, preserving backward compatibility with the legacy
single-`domain` secret shape (pre-#108).
TXT registry:
- --txt-owner-id=$(SOVEREIGN_FQDN), --txt-prefix=_externaldns.<sub>.
- Cluster overlays substitute SOVEREIGN_FQDN via the bp-catalyst-platform
umbrella so two clusters sharing a parent zone (alpha.omani.works,
beta.omani.works) cannot collide.
Closes#109.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7 more component READMEs got role-in-Catalyst banners:
- vpa, keda, reloader → per-host-cluster scaling/ops layer (§3.4).
Reloader specifically calls out its role in Catalyst's secret-
rotation flow (rolling deploy on K8s Secret hash change).
- external-dns → per-host-cluster DNS-sync (§3.1); pairs with k8gb
for the GSLB zone separation.
- coraza → DMZ-block WAF on every host cluster (§3.1).
- crossplane → per-Sovereign on the management cluster (§3.2);
banner explicitly emphasizes the agreed "never a user-facing
surface" rule (Users don't write Compositions in Application
configs; Blueprint authors and advanced contributors do). Cross-
references the no-fourth-surface clause in ARCHITECTURE §4/§7
and the Crossplane Composition section in BLUEPRINT-AUTHORING §8.
- opentofu → repositioned as Phase-0-only, runs on `catalyst-
provisioner` only, NOT installed on host clusters at runtime.
opentofu drift fixes (uncovered by line-by-line read):
- Section 5 line 182: "Bootstrap Wizard prompts for cloud credentials"
→ "Catalyst Bootstrap (Phase 0) prompts for cloud credentials"
(banned term).
- Same section line 186: "ESO PushSecrets sync to both regional
OpenBao instances" — the active-active drift Pass 7 corrected
elsewhere, still here. Replaced with "writes go to the primary
OpenBao region only; replicas pick up via async perf replication".
VALIDATION-LOG: Pass 10 entry added.
Refs #37
Remove hierarchical grouping (networking/, security/, etc.) and use flat
structure for all 41 platform components.
Changes:
- All components now directly under platform/ (no subfolders)
- AI Hub components moved from meta-platforms/ai-hub/components/ to platform/
- Open Banking components (lago, openmeter) moved to platform/
- meta-platforms/ now only contains README files that reference platform/
- Open Banking custom services remain in meta-platforms/open-banking/services/
Structure:
- platform/ (41 components, flat)
- meta-platforms/ai-hub/ (README only, references platform/)
- meta-platforms/open-banking/ (README + 6 custom services)
All documentation links updated.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>