openova/platform/openbao
e3mrah 0a45a790e7
fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909)
PR #1888 (TBD-A30) fixed catalyst-system HTTPRoutes for multi-zone
Sovereigns whose Cilium Gateway renames HTTPS listeners from `https` to
`https-<sanitised-zone>` (e.g. `https-omani-works`, `https-omani-homes`)
when more than one parent zone is enabled. Every public HTTPRoute pinned
to `sectionName: https` got `Accepted=False NoMatchingListener` and the
hosted service 404'd / connection-refused.

That fix only touched products/catalyst/chart. Per-blueprint HTTPRoutes
shipped the same `sectionName: https` default in values.yaml, so on a
multi-zone Sovereign every blueprint route — gitea, grafana, harbor,
keycloak, newapi, openbao, powerdns, stalwart-tenant — silently failed
to attach. TBD-A40 / issue #1902.

Sweep verbatim:

  $ git grep -nE 'sectionName:[[:space:]]+(https|"https")[[:space:]]*$' \
      platform/*/chart/ products/ clusters/ core/ 2>/dev/null \
      | grep -v 'platform/gateway-api/chart/templates'
  platform/gitea/chart/values.yaml:168:    sectionName: https
  platform/grafana/chart/values.yaml:124:    sectionName: https
  platform/harbor/chart/values.yaml:437:    sectionName: https
  platform/keycloak/chart/values.yaml:482:    sectionName: https
  platform/newapi/chart/values.yaml:721:      sectionName: https
  platform/openbao/chart/values.yaml:72:    sectionName: https
  platform/powerdns/chart/values.yaml:407:      sectionName: https
  platform/stalwart-tenant/chart/values.yaml:297:      sectionName: https
  products/catalyst/bootstrap/api/internal/handler/sme_tenant_gitops.go:802:        sectionName: https

Fix (Option C — omit sectionName, same as PR #1888):

  - 8 blueprint values.yaml defaults flipped from `sectionName: https` to
    `sectionName: ""`. The chart templates already guard with `{{- with
    .Values.gateway.parentRef.sectionName }}`, so a blank value drops the
    field entirely and Cilium Gateway matches by hostname filter.

  - platform/newapi/chart/templates/httproute.yaml was the outlier: it
    used `default "https" $parent.sectionName` which fell back to `https`
    even when values.yaml said empty. Rewritten to `{{- with
    $parent.sectionName }}` so empty drops the field — same pattern as
    the other 7 blueprints.

  - products/catalyst/bootstrap/api/internal/handler/sme_tenant_gitops.go
    renders a per-tenant bp-keycloak HelmRelease and injected
    `sectionName: https` into spec.values. Flipped to `sectionName: ""`
    so the bp-keycloak chart's `{{- with }}` guard drops the field.

Validation (real `helm template`, default values, gateway enabled, no
sectionName override) — Principle #15:

  gitea            : sectionName lines in rendered output = 0
  grafana          : sectionName lines in rendered output = 0
  harbor           : sectionName lines in rendered output = 0
  keycloak         : sectionName lines in rendered output = 0
  openbao          : sectionName lines in rendered output = 0
  powerdns         : sectionName lines in rendered output = 0
  newapi           : sectionName lines in rendered output = 0
  stalwart-tenant  : sectionName lines in rendered output = 0

Override path preserved — `--set ...parentRef.sectionName=https-omani-works`
on each chart renders `sectionName: "https-omani-works"` correctly,
so operators on single-zone clusters or non-Cilium gateways can still
pin explicitly via bootstrap-kit overlay.

helm lint clean on all 8 blueprint charts (newapi cnpg-cluster.yaml lint
error is pre-existing on origin/main, unrelated to this fix).

Chart bumps (each blueprint also bumps blueprint.yaml spec.version per
#817 lockstep):
  bp-gitea            1.2.7  -> 1.2.8
  bp-grafana          1.0.1  -> 1.0.2
  bp-harbor           1.2.17 -> 1.2.18
  bp-keycloak         1.4.5  -> 1.4.6
  bp-newapi           1.4.22 -> 1.4.23
  bp-openbao          1.2.16 -> 1.2.17
  bp-powerdns         1.2.3  -> 1.2.4
  bp-stalwart-tenant  0.1.2  -> 0.1.3

Refs TBD-A40.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 07:57:12 +04:00
..
chart fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
blueprint.yaml fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
README.md feat(bp-openbao): auto-unseal flow — cloud-init seed + post-install init Job (closes #316) (#408) 2026-05-01 16:45:44 +04:00

OpenBao

Secrets management backend for Catalyst. Apache 2.0 / MPL 2.0 fork of HashiCorp Vault, drop-in API-compatible.

Status: Accepted | Updated: 2026-04-27

Catalyst role: Per-Sovereign supporting service in the Catalyst control plane (see docs/PLATFORM-TECH-STACK.md §2.3). For multi-region semantics and rotation policy, docs/SECURITY.md is canonical.


Overview

OpenBao is a Linux Foundation project forked from HashiCorp Vault after HashiCorp changed Vault's license from MPL 2.0 to the Business Source License (BSL 1.1). OpenBao retains the open license and provides API-compatible secrets management.

OpenBao provides centralized secrets management with:

  • Secrets stored securely outside of Git (Git holds only ExternalSecret references).
  • Independent Raft cluster per region (no stretched cluster).
  • Asynchronous Performance Replication from primary region to standbys.
  • Integration with External Secrets Operator (ESO).
  • Workload authentication via SPIFFE SVID — short-lived, auto-rotating.

Architecture: independent Raft per region (NOT a stretched cluster)

Each region runs its own 3-node Raft cluster. Quorum is intra-region only — region failures are independent failure domains. Cross-region replication is asynchronous Performance Replication from primary → secondaries.

flowchart TB
    subgraph Region1["Region 1 (primary)"]
        V1[OpenBao 3-node Raft]
        ES1[ExternalSecret CR]
        KS1[K8s Secret]
    end

    subgraph Region2["Region 2 (replica)"]
        V2[OpenBao 3-node Raft<br>independent quorum]
        ES2[ExternalSecret CR]
        KS2[K8s Secret]
    end

    subgraph Region3["Region 3 (DR replica)"]
        V3[OpenBao 3-node Raft<br>independent quorum]
        ES3[ExternalSecret CR]
        KS3[K8s Secret]
    end

    V1 -.->|"async perf replication"| V2
    V1 -.->|"async perf replication"| V3
    V1 -->|"local read"| ES1
    V2 -->|"local read"| ES2
    V3 -->|"local read"| ES3
    ES1 -->|"materialize"| KS1
    ES2 -->|"materialize"| KS2
    ES3 -->|"materialize"| KS3

Key design (canonical in docs/SECURITY.md §5):

  • Independent Raft per region. No cross-region quorum. A whole-region failure does NOT block any other region.
  • Single-primary writes. Rotations and new-secret writes go to the primary OpenBao only.
  • Async perf replication. Lag <1s typical; replicas serve reads at sub-10ms latency.
  • Explicit DR promotion. Either sovereign-admin-approved or automated via failover-controller (with strict criteria — not on every blip).
  • Apps read locally. Each region's ExternalSecret pulls from its local OpenBao replica.
  • No SOPS. Plaintext never in Git.

The earlier active-active bidirectional design was rejected as a stretched cluster — it would have made one region's network blip take down all writes. This file's architecture matches the agreed independent-Raft model.


Deployment Options

Option Type Notes
OpenBao Self-Hosted Self-hosted Full control, one per cluster
AWS Secrets Manager Managed If AWS chosen
GCP Secret Manager Managed If GCP chosen
Azure Key Vault Managed If Azure chosen

Recommended: OpenBao Self-Hosted for full control


Configuration

OpenBao Deployment (Helm)

server:
  ha:
    enabled: true
    replicas: 3
    raft:
      enabled: true
      config: |
        storage "raft" {
          path = "/openbao/data"
        }        

  dataStorage:
    enabled: true
    size: 10Gi
    storageClass: <storage-class>

  ingress:
    enabled: true
    ingressClassName: cilium
    hosts:
      - host: bao.<location-code>.<sovereign-domain>

injector:
  enabled: false  # Using ESO instead

ClusterSecretStore (local read)

Each region defines ONE ClusterSecretStore pointing at its local OpenBao replica. Apps in any region read from their local replica only — replication delivers post-write values within seconds.

apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: bao-local
spec:
  provider:
    vault:                                # ESO provider type stays `vault` —
                                          # OpenBao is wire-compatible.
      server: "https://bao.<location-code>.<sovereign-domain>"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "external-secrets"

Note: The ESO provider type remains vault because OpenBao is API-compatible and ESO uses the same provider configuration.

Writes go to the primary region

Secret rotations, new-secret creates, and policy updates target the primary OpenBao only. Replicas refuse writes (Performance Replication is one-way: primary → standby). The ESO PushSecret is configured to point at the primary's ClusterSecretStore explicitly:

apiVersion: external-secrets.io/v1alpha1
kind: PushSecret
metadata:
  name: push-db-credentials
  namespace: databases
spec:
  refreshInterval: 1h
  secretStoreRefs:
    - name: bao-primary                   # writes target the primary region only
      kind: ClusterSecretStore
  selector:
    secret:
      name: db-credentials
  data:
    - match:
        secretKey: password
        remoteRef:
          remoteKey: databases/db-credentials
          property: password

ExternalSecret (local read in every region)

Reads always pull from the local OpenBao replica.

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-credentials
  namespace: databases
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: bao-local
    kind: ClusterSecretStore
  target:
    name: db-credentials
    creationPolicy: Owner
  data:
    - secretKey: password
      remoteRef:
        key: databases/db-credentials
        property: password

DR promotion

If the primary region fails, a replica is explicitly promoted (sovereign-admin approval or failover-controller automation). New writes are blocked briefly during promotion (~30s), then the new primary accepts writes. See docs/SECURITY.md §5.2.


Bootstrap Procedure

  1. Catalyst bootstrap (Phase 0 of Sovereign provisioning) deploys OpenBao as independent Raft cluster per region (no stretched cluster — see docs/SECURITY.md §5).
  2. Auto-unseal flow (issue #316, chart v1.2.0+): Cloud-init on the control-plane node generates a 32-byte recovery seed, writes it to a single-use K8s Secret openbao-recovery-seed in the openbao namespace. The bp-openbao Helm chart's post-install init Job (hook weight 5) consumes the seed, calls bao operator init -recovery-shares=1 -recovery-threshold=1, persists the recovery key inside OpenBao's auto-unseal config, and deletes the seed Secret on success. The recovery key + root token live ONLY inside OpenBao's Raft state — never in a K8s Secret. Subsequent pod restarts unseal automatically without operator intervention. Set autoUnseal.enabled=true (default off; cluster overlay flips it on per-Sovereign).
  3. Kubernetes auth bootstrap (issue #316): A second post-install Job (hook weight 10) enables the Kubernetes auth method, mounts kv-v2 at secret/, writes the external-secrets-read policy, and binds the external-secrets role to the ESO ServiceAccount in external-secrets-system. ESO's ClusterSecretStore vault-region1 (platform/external-secrets) authenticates via this role on every secret read. Configure under autoUnseal.kubernetesAuth.*.
  4. Cross-region async perf replication is configured for read availability and DR.
  5. ESO configured with local-region ClusterSecretStores; cross-region reads via the same workload SVID.
  6. Initial secrets created via K8s + PushSecrets, never plaintext in Git.

No SOPS: Credentials entered interactively during bootstrap, never stored in Git. See docs/SECURITY.md.

Auto-unseal alternatives (out of scope for solo Sovereign)

Option When applicable
A. Shamir + cloud-init seed Default for solo Sovereign — implemented in chart v1.2.0. No managed-KMS dependency; the recovery key is generated on the control-plane at provision time and persisted only inside OpenBao's own Raft state.
B. Transit-seal via peer OpenBao Multi-region tier-1 corporate cluster (one Sovereign unseals another). Out of scope for omantel/single-region.
C. Cloud-KMS auto-unseal (AWS KMS, GCP KMS, Azure Key Vault) When the Sovereign runs on a hyperscaler that provides managed-KMS. Hetzner has no managed-KMS — Option A is the only viable path on Hetzner.
D. Operator-supplied recovery shards (air-gap) Documented in docs/SECURITY.md. Used when no automated boot-time secret pipeline is acceptable.

Part of OpenOva