History

e3mrah aa60cfb84e fix(multi): Family G — 6 singletons (C8-001/C8-005/C9-006/C10-002/C10-003/C7-007) (#1601 ) Wave 2 Family G batched ship. C7-004 (sso/wiki/workflows/storybook + registry/api HTTPRoutes) intentionally skipped — sso/wiki/storybook have no shipped backend; registry (harbor) + api (catalyst-api) HTTPRoutes already exist and 404 is a runtime/HR-readiness symptom, not a missing route. Flagged for architect-led ticket rather than silent route-alias synthesis. C9-006 — hcloud-volumes StorageClass missing on fresh prov Root cause: platform/hcloud-csi/chart/ existed but was never wired into bootstrap-kit, so fresh Sovereigns defaulted PVCs to local-path (rancher.io/local-path) — node-pinned, can't survive Pod reschedule. Fix: new slot 17a-bp-hcloud-csi.yaml + chart 1.0.0→1.1.0 bump that adds templates/hcloud-token-secret.yaml so the controller can authenticate to Hetzner. Mirrors bp-hcloud-ccm (slot 55) + bp-cluster-autoscaler-hcloud (slot 50) wiring. C10-002 — /fleet/applications returns 0 items despite 21 sovereigns Root cause: collectFleetSovereigns filtered AdoptedAt!=nil (mirrored ListDeployments). On a steady-state fleet every Sovereign is adopted, so the dashboard rendered empty despite hundreds of succeeded jobs. Fix: remove the adopted-filter from collectFleetSovereigns (the fleet view's whole purpose is to enumerate every provisioned Sovereign). ListDeployments still applies the filter — it backs the provisioner's in-flight tab, a different surface. Adopted rows surface with Health=green when otherwise unknown. C10-003 — per-region install-* Jobs stuck "pending" despite ready Root cause: lastState dedup in helmwatch_bridge — secondary watchers attaching AFTER an HR already settled at Installed never observed a state transition, so the seed value (HelmStatePending) never converged. Fix: at markPhase1Done(OutcomeReady), backfill every secondary watcher's informer snapshot into the shared jobs.Bridge via the idempotent SeedJobsFromInformerList path. Runs INLINE (not goroutine) — runPhase1Watch defers stopSecondaries() which clears dep.secondaryWatchers as soon as markPhase1Done returns, so a goroutine would race the cleanup. C7-007 — legacy sovereign-wildcard-tls Cert+Secret pair orphaned Root cause: PR O moved the Cilium Gateway listener's certificateRefs to the dashed-suffix per-zone Secret but left the legacy bare-name Certificate template behind, so cert-manager kept renewing an orphan. Fix: (a) rename the Certificate + Secret to the dashed-suffix shape (single-source-of-truth), and (b) add a one-shot Job (legacy-cert-cleanup) that deletes the pre-PR-O Cert+Secret pair via alpine/k8s, idempotent for fresh provs. Removable from kustomization.yaml once every live prov has reconciled past it. C8-001 — D22 Settings em-dash placeholders on chroot Sovereign Root cause: SettingsPage read Capacity / CP size / Pool subdomain / BYO domain from useWizardStore() (zustand+persist localStorage). The chroot Sovereign console runs on a fresh browser session post-handover with empty localStorage, so the four fields rendered em-dashes. The data IS persisted on the deployment record (RedactedRequest) — gap was that Deployment.State() never surfaced it. Fix: lift controlPlaneSize / sovereignPoolDomain / sovereignSubdomain / sovereignDomainMode / sovereignByoDomain / regionControlPlaneSizes / orgName / orgEmail to the State() map + extend DeploymentSnapshot TS type + SettingsPage reads snapshot-first with wizard store as fallback (mothership wizard- in-flight case). C8-005 — D20 Jobs page missing region filter dropdown Root cause: multi-region Sovereigns expose install-<region>:<chart> Jobs but JobsTable offered only status / app / parent filters, forcing operators to type the region key into the free-text search. Fix: new regionFromJob(job) pure helper parses the canonical <region>:<chart> appId (fallback: install-<region>:<chart> jobName). Dropdown is visible only when 2+ regions appear in the current job set (single-region Sovereigns see no one-option no-op). Sorted lexically. Test coverage: 4 helper cases + 3 dropdown cases in JobsTable.test.tsx. Architect-first compliance: • bp-hcloud-csi wiring mirrors bp-hcloud-ccm (slot 55) pattern • legacy-cert-cleanup uses alpine/k8s (NOT bitnami/kubectl — see self-sovereign-cutover/values.yaml:252 Bitnami-deprecation note) • alpine/k8s image pulled via harbor.openova.io/proxy-dockerhub (mirror-everything rule) • regionFromJob mirrors helmwatch_bridge.go componentID encoding (3 input shapes: bare, region-prefixed, install-region-prefixed) • State() snapshot additions stay slim — only the 4 founder-flagged fields + a few zero-cost adjacents Co-authored-by: hatiyildiz <hatice.yildiz@openova.io> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-17 22:20:29 +04:00
..
chart	fix(multi): Family G — 6 singletons (C8-001/C8-005/C9-006/C10-002/C10-003/C7-007) (#1601 )	2026-05-17 22:20:29 +04:00
blueprint.yaml	feat(bp-hcloud-csi): scaffold Hetzner CSI driver Blueprint (slice H6, #1095 ) (#1119 )	2026-05-08 22:56:19 +04:00
README.md	feat(bp-hcloud-csi): scaffold Hetzner CSI driver Blueprint (slice H6, #1095 ) (#1119 )	2026-05-08 22:56:19 +04:00

README.md

bp-hcloud-csi

Status: Phase-0 scaffold (#1095 slice H6). Activated by EPIC-6 (#1101). Updated: 2026-05-08

Upstream Hetzner Cloud CSI driver (hetznercloud/csi-driver) wrapped as a Catalyst Blueprint. Provides the hcloud-volumes StorageClass that backs multi-node stateful workloads on Hetzner Sovereigns — required for CNPG primary/replica pairs across nodes (and across regions, once Cilium ClusterMesh + Continuum land in #1101).

This Blueprint is not in the bootstrap-kit. The default StorageClass on existing Sovereigns is local-path (single-node-bound) — installing this chart with defaultStorageClass: true flips the default to hcloud-volumes, which would migrate every new PVC. Operators activate deliberately per Sovereign once the upgrade path for existing PVCs is sized.

What it ships

Template	Effect
`storageclass.yaml`	StorageClass `hcloud-volumes` with `WaitForFirstConsumer` binding (Pod scheduling pins the volume to the right node) and `allowVolumeExpansion: true`. Optional `volumeBindingMode: Immediate` override for use cases that need pre-provisioning.
`storageclass-default.yaml`	Annotates `hcloud-volumes` as the cluster default when `.Values.defaultStorageClass: true`.
`volumesnapshotclass.yaml`	VolumeSnapshotClass for backup workflows. Default off.

Upstream hcloud-csi-driver itself is pulled in as a Helm subchart from hetznercloud/csi-driver (referenced in Chart.yaml). The catalyst overlay templates above sit alongside it.

Activation contract

# values.yaml override (or per-Sovereign overlay)
enabled: true
defaultStorageClass: true              # flip the cluster default
hcloudCsi:
  controller:
    replicas: 2                        # HA for multi-node Sovereigns
  storageClasses:
    - name: hcloud-volumes
      reclaimPolicy: Delete
      volumeBindingMode: WaitForFirstConsumer
      allowVolumeExpansion: true

When enabled: false (the default), no resources render — installing the chart is a no-op until the operator opts in.

Why a separate Blueprint, not a values toggle on bp-cilium

CSI drivers are independent of CNI. Mixing them risks coupling the network plane upgrade cycle to the storage plane upgrade cycle. Separate Blueprint keeps the surfaces independent.

Why default-OFF on `defaultStorageClass`

Flipping the cluster's default StorageClass is a destructive change for Pods relying on the previous default's volume-binding semantics. Existing Sovereigns ship with local-path as default; an in-place migration plan (drain, repvc, copy-data) is its own slice. Keeping defaultStorageClass: false here ensures installing the chart is reversible.

References

docs/EPICS-1-6-unified-design.md §3.9 row 6
docs/SRE.md §2.5 (CNPG storage requirements)
platform/cnpg/README.md §38-58
Upstream: https://github.com/hetznercloud/csi-driver