PR #1888 (TBD-A30) fixed catalyst-system HTTPRoutes for multi-zone Sovereigns whose Cilium Gateway renames HTTPS listeners from `https` to `https-<sanitised-zone>` (e.g. `https-omani-works`, `https-omani-homes`) when more than one parent zone is enabled. Every public HTTPRoute pinned to `sectionName: https` got `Accepted=False NoMatchingListener` and the hosted service 404'd / connection-refused. That fix only touched products/catalyst/chart. Per-blueprint HTTPRoutes shipped the same `sectionName: https` default in values.yaml, so on a multi-zone Sovereign every blueprint route — gitea, grafana, harbor, keycloak, newapi, openbao, powerdns, stalwart-tenant — silently failed to attach. TBD-A40 / issue #1902. Sweep verbatim: $ git grep -nE 'sectionName:[[:space:]]+(https|"https")[[:space:]]*$' \ platform/*/chart/ products/ clusters/ core/ 2>/dev/null \ | grep -v 'platform/gateway-api/chart/templates' platform/gitea/chart/values.yaml:168: sectionName: https platform/grafana/chart/values.yaml:124: sectionName: https platform/harbor/chart/values.yaml:437: sectionName: https platform/keycloak/chart/values.yaml:482: sectionName: https platform/newapi/chart/values.yaml:721: sectionName: https platform/openbao/chart/values.yaml:72: sectionName: https platform/powerdns/chart/values.yaml:407: sectionName: https platform/stalwart-tenant/chart/values.yaml:297: sectionName: https products/catalyst/bootstrap/api/internal/handler/sme_tenant_gitops.go:802: sectionName: https Fix (Option C — omit sectionName, same as PR #1888): - 8 blueprint values.yaml defaults flipped from `sectionName: https` to `sectionName: ""`. The chart templates already guard with `{{- with .Values.gateway.parentRef.sectionName }}`, so a blank value drops the field entirely and Cilium Gateway matches by hostname filter. - platform/newapi/chart/templates/httproute.yaml was the outlier: it used `default "https" $parent.sectionName` which fell back to `https` even when values.yaml said empty. Rewritten to `{{- with $parent.sectionName }}` so empty drops the field — same pattern as the other 7 blueprints. - products/catalyst/bootstrap/api/internal/handler/sme_tenant_gitops.go renders a per-tenant bp-keycloak HelmRelease and injected `sectionName: https` into spec.values. Flipped to `sectionName: ""` so the bp-keycloak chart's `{{- with }}` guard drops the field. Validation (real `helm template`, default values, gateway enabled, no sectionName override) — Principle #15: gitea : sectionName lines in rendered output = 0 grafana : sectionName lines in rendered output = 0 harbor : sectionName lines in rendered output = 0 keycloak : sectionName lines in rendered output = 0 openbao : sectionName lines in rendered output = 0 powerdns : sectionName lines in rendered output = 0 newapi : sectionName lines in rendered output = 0 stalwart-tenant : sectionName lines in rendered output = 0 Override path preserved — `--set ...parentRef.sectionName=https-omani-works` on each chart renders `sectionName: "https-omani-works"` correctly, so operators on single-zone clusters or non-Cilium gateways can still pin explicitly via bootstrap-kit overlay. helm lint clean on all 8 blueprint charts (newapi cnpg-cluster.yaml lint error is pre-existing on origin/main, unrelated to this fix). Chart bumps (each blueprint also bumps blueprint.yaml spec.version per #817 lockstep): bp-gitea 1.2.7 -> 1.2.8 bp-grafana 1.0.1 -> 1.0.2 bp-harbor 1.2.17 -> 1.2.18 bp-keycloak 1.4.5 -> 1.4.6 bp-newapi 1.4.22 -> 1.4.23 bp-openbao 1.2.16 -> 1.2.17 bp-powerdns 1.2.3 -> 1.2.4 bp-stalwart-tenant 0.1.2 -> 0.1.3 Refs TBD-A40. Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| chart | ||
| blueprint.yaml | ||
| README.md | ||
OpenBao
Secrets management backend for Catalyst. Apache 2.0 / MPL 2.0 fork of HashiCorp Vault, drop-in API-compatible.
Status: Accepted | Updated: 2026-04-27
Catalyst role: Per-Sovereign supporting service in the Catalyst control plane (see
docs/PLATFORM-TECH-STACK.md§2.3). For multi-region semantics and rotation policy,docs/SECURITY.mdis canonical.
Overview
OpenBao is a Linux Foundation project forked from HashiCorp Vault after HashiCorp changed Vault's license from MPL 2.0 to the Business Source License (BSL 1.1). OpenBao retains the open license and provides API-compatible secrets management.
OpenBao provides centralized secrets management with:
- Secrets stored securely outside of Git (Git holds only
ExternalSecretreferences). - Independent Raft cluster per region (no stretched cluster).
- Asynchronous Performance Replication from primary region to standbys.
- Integration with External Secrets Operator (ESO).
- Workload authentication via SPIFFE SVID — short-lived, auto-rotating.
Architecture: independent Raft per region (NOT a stretched cluster)
Each region runs its own 3-node Raft cluster. Quorum is intra-region only — region failures are independent failure domains. Cross-region replication is asynchronous Performance Replication from primary → secondaries.
flowchart TB
subgraph Region1["Region 1 (primary)"]
V1[OpenBao 3-node Raft]
ES1[ExternalSecret CR]
KS1[K8s Secret]
end
subgraph Region2["Region 2 (replica)"]
V2[OpenBao 3-node Raft<br>independent quorum]
ES2[ExternalSecret CR]
KS2[K8s Secret]
end
subgraph Region3["Region 3 (DR replica)"]
V3[OpenBao 3-node Raft<br>independent quorum]
ES3[ExternalSecret CR]
KS3[K8s Secret]
end
V1 -.->|"async perf replication"| V2
V1 -.->|"async perf replication"| V3
V1 -->|"local read"| ES1
V2 -->|"local read"| ES2
V3 -->|"local read"| ES3
ES1 -->|"materialize"| KS1
ES2 -->|"materialize"| KS2
ES3 -->|"materialize"| KS3
Key design (canonical in docs/SECURITY.md §5):
- Independent Raft per region. No cross-region quorum. A whole-region failure does NOT block any other region.
- Single-primary writes. Rotations and new-secret writes go to the primary OpenBao only.
- Async perf replication. Lag <1s typical; replicas serve reads at sub-10ms latency.
- Explicit DR promotion. Either
sovereign-admin-approved or automated via failover-controller (with strict criteria — not on every blip). - Apps read locally. Each region's ExternalSecret pulls from its local OpenBao replica.
- No SOPS. Plaintext never in Git.
The earlier active-active bidirectional design was rejected as a stretched cluster — it would have made one region's network blip take down all writes. This file's architecture matches the agreed independent-Raft model.
Deployment Options
| Option | Type | Notes |
|---|---|---|
| OpenBao Self-Hosted | Self-hosted | Full control, one per cluster |
| AWS Secrets Manager | Managed | If AWS chosen |
| GCP Secret Manager | Managed | If GCP chosen |
| Azure Key Vault | Managed | If Azure chosen |
Recommended: OpenBao Self-Hosted for full control
Configuration
OpenBao Deployment (Helm)
server:
ha:
enabled: true
replicas: 3
raft:
enabled: true
config: |
storage "raft" {
path = "/openbao/data"
}
dataStorage:
enabled: true
size: 10Gi
storageClass: <storage-class>
ingress:
enabled: true
ingressClassName: cilium
hosts:
- host: bao.<location-code>.<sovereign-domain>
injector:
enabled: false # Using ESO instead
ClusterSecretStore (local read)
Each region defines ONE ClusterSecretStore pointing at its local OpenBao replica. Apps in any region read from their local replica only — replication delivers post-write values within seconds.
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: bao-local
spec:
provider:
vault: # ESO provider type stays `vault` —
# OpenBao is wire-compatible.
server: "https://bao.<location-code>.<sovereign-domain>"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "external-secrets"
Note: The ESO provider type remains
vaultbecause OpenBao is API-compatible and ESO uses the same provider configuration.
Writes go to the primary region
Secret rotations, new-secret creates, and policy updates target the primary OpenBao only. Replicas refuse writes (Performance Replication is one-way: primary → standby). The ESO PushSecret is configured to point at the primary's ClusterSecretStore explicitly:
apiVersion: external-secrets.io/v1alpha1
kind: PushSecret
metadata:
name: push-db-credentials
namespace: databases
spec:
refreshInterval: 1h
secretStoreRefs:
- name: bao-primary # writes target the primary region only
kind: ClusterSecretStore
selector:
secret:
name: db-credentials
data:
- match:
secretKey: password
remoteRef:
remoteKey: databases/db-credentials
property: password
ExternalSecret (local read in every region)
Reads always pull from the local OpenBao replica.
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
namespace: databases
spec:
refreshInterval: 1h
secretStoreRef:
name: bao-local
kind: ClusterSecretStore
target:
name: db-credentials
creationPolicy: Owner
data:
- secretKey: password
remoteRef:
key: databases/db-credentials
property: password
DR promotion
If the primary region fails, a replica is explicitly promoted (sovereign-admin approval or failover-controller automation). New writes are blocked briefly during promotion (~30s), then the new primary accepts writes. See docs/SECURITY.md §5.2.
Bootstrap Procedure
- Catalyst bootstrap (Phase 0 of Sovereign provisioning) deploys OpenBao as independent Raft cluster per region (no stretched cluster — see
docs/SECURITY.md§5). - Auto-unseal flow (issue #316, chart v1.2.0+): Cloud-init on the control-plane node generates a 32-byte recovery seed, writes it to a single-use K8s Secret
openbao-recovery-seedin theopenbaonamespace. The bp-openbao Helm chart's post-install init Job (hook weight 5) consumes the seed, callsbao operator init -recovery-shares=1 -recovery-threshold=1, persists the recovery key inside OpenBao's auto-unseal config, and deletes the seed Secret on success. The recovery key + root token live ONLY inside OpenBao's Raft state — never in a K8s Secret. Subsequent pod restarts unseal automatically without operator intervention. SetautoUnseal.enabled=true(default off; cluster overlay flips it on per-Sovereign). - Kubernetes auth bootstrap (issue #316): A second post-install Job (hook weight 10) enables the Kubernetes auth method, mounts kv-v2 at
secret/, writes theexternal-secrets-readpolicy, and binds theexternal-secretsrole to the ESO ServiceAccount inexternal-secrets-system. ESO's ClusterSecretStorevault-region1(platform/external-secrets) authenticates via this role on every secret read. Configure underautoUnseal.kubernetesAuth.*. - Cross-region async perf replication is configured for read availability and DR.
- ESO configured with local-region ClusterSecretStores; cross-region reads via the same workload SVID.
- Initial secrets created via K8s + PushSecrets, never plaintext in Git.
No SOPS: Credentials entered interactively during bootstrap, never stored in Git. See docs/SECURITY.md.
Auto-unseal alternatives (out of scope for solo Sovereign)
| Option | When applicable |
|---|---|
| A. Shamir + cloud-init seed | Default for solo Sovereign — implemented in chart v1.2.0. No managed-KMS dependency; the recovery key is generated on the control-plane at provision time and persisted only inside OpenBao's own Raft state. |
| B. Transit-seal via peer OpenBao | Multi-region tier-1 corporate cluster (one Sovereign unseals another). Out of scope for omantel/single-region. |
| C. Cloud-KMS auto-unseal (AWS KMS, GCP KMS, Azure Key Vault) | When the Sovereign runs on a hyperscaler that provides managed-KMS. Hetzner has no managed-KMS — Option A is the only viable path on Hetzner. |
| D. Operator-supplied recovery shards (air-gap) | Documented in docs/SECURITY.md. Used when no automated boot-time secret pipeline is acceptable. |
Part of OpenOva