Wave 2 Family G batched ship. C7-004 (sso/wiki/workflows/storybook +
registry/api HTTPRoutes) intentionally skipped — sso/wiki/storybook
have no shipped backend; registry (harbor) + api (catalyst-api) HTTPRoutes
already exist and 404 is a runtime/HR-readiness symptom, not a missing
route. Flagged for architect-led ticket rather than silent route-alias
synthesis.
C9-006 — hcloud-volumes StorageClass missing on fresh prov
Root cause: platform/hcloud-csi/chart/ existed but was never wired
into bootstrap-kit, so fresh Sovereigns defaulted PVCs to local-path
(rancher.io/local-path) — node-pinned, can't survive Pod reschedule.
Fix: new slot 17a-bp-hcloud-csi.yaml + chart 1.0.0→1.1.0 bump that
adds templates/hcloud-token-secret.yaml so the controller can
authenticate to Hetzner. Mirrors bp-hcloud-ccm (slot 55) +
bp-cluster-autoscaler-hcloud (slot 50) wiring.
C10-002 — /fleet/applications returns 0 items despite 21 sovereigns
Root cause: collectFleetSovereigns filtered AdoptedAt!=nil (mirrored
ListDeployments). On a steady-state fleet every Sovereign is adopted,
so the dashboard rendered empty despite hundreds of succeeded jobs.
Fix: remove the adopted-filter from collectFleetSovereigns (the
fleet view's whole purpose is to enumerate every provisioned
Sovereign). ListDeployments still applies the filter — it backs the
provisioner's in-flight tab, a different surface. Adopted rows
surface with Health=green when otherwise unknown.
C10-003 — per-region install-* Jobs stuck "pending" despite ready
Root cause: lastState dedup in helmwatch_bridge — secondary
watchers attaching AFTER an HR already settled at Installed never
observed a state transition, so the seed value (HelmStatePending)
never converged. Fix: at markPhase1Done(OutcomeReady), backfill
every secondary watcher's informer snapshot into the shared
jobs.Bridge via the idempotent SeedJobsFromInformerList path.
Runs INLINE (not goroutine) — runPhase1Watch defers
stopSecondaries() which clears dep.secondaryWatchers as soon as
markPhase1Done returns, so a goroutine would race the cleanup.
C7-007 — legacy sovereign-wildcard-tls Cert+Secret pair orphaned
Root cause: PR O moved the Cilium Gateway listener's
certificateRefs to the dashed-suffix per-zone Secret but left the
legacy bare-name Certificate template behind, so cert-manager
kept renewing an orphan. Fix: (a) rename the Certificate +
Secret to the dashed-suffix shape (single-source-of-truth), and
(b) add a one-shot Job (legacy-cert-cleanup) that deletes the
pre-PR-O Cert+Secret pair via alpine/k8s, idempotent for fresh
provs. Removable from kustomization.yaml once every live prov
has reconciled past it.
C8-001 — D22 Settings em-dash placeholders on chroot Sovereign
Root cause: SettingsPage read Capacity / CP size / Pool subdomain /
BYO domain from useWizardStore() (zustand+persist localStorage).
The chroot Sovereign console runs on a fresh browser session
post-handover with empty localStorage, so the four fields rendered
em-dashes. The data IS persisted on the deployment record
(RedactedRequest) — gap was that Deployment.State() never surfaced
it. Fix: lift controlPlaneSize / sovereignPoolDomain /
sovereignSubdomain / sovereignDomainMode / sovereignByoDomain /
regionControlPlaneSizes / orgName / orgEmail to the State() map +
extend DeploymentSnapshot TS type + SettingsPage reads
snapshot-first with wizard store as fallback (mothership wizard-
in-flight case).
C8-005 — D20 Jobs page missing region filter dropdown
Root cause: multi-region Sovereigns expose install-<region>:<chart>
Jobs but JobsTable offered only status / app / parent filters,
forcing operators to type the region key into the free-text search.
Fix: new regionFromJob(job) pure helper parses the canonical
<region>:<chart> appId (fallback: install-<region>:<chart> jobName).
Dropdown is visible only when 2+ regions appear in the current job
set (single-region Sovereigns see no one-option no-op). Sorted
lexically. Test coverage: 4 helper cases + 3 dropdown cases in
JobsTable.test.tsx.
Architect-first compliance:
• bp-hcloud-csi wiring mirrors bp-hcloud-ccm (slot 55) pattern
• legacy-cert-cleanup uses alpine/k8s (NOT bitnami/kubectl — see
self-sovereign-cutover/values.yaml:252 Bitnami-deprecation note)
• alpine/k8s image pulled via harbor.openova.io/proxy-dockerhub
(mirror-everything rule)
• regionFromJob mirrors helmwatch_bridge.go componentID encoding
(3 input shapes: bare, region-prefixed, install-region-prefixed)
• State() snapshot additions stay slim — only the 4 founder-flagged
fields + a few zero-cost adjacents
Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| chart | ||
| blueprint.yaml | ||
| README.md | ||
bp-hcloud-csi
Status: Phase-0 scaffold (#1095 slice H6). Activated by EPIC-6 (#1101). Updated: 2026-05-08
Upstream Hetzner Cloud CSI driver (hetznercloud/csi-driver) wrapped as a
Catalyst Blueprint. Provides the hcloud-volumes StorageClass that backs
multi-node stateful workloads on Hetzner Sovereigns — required for CNPG
primary/replica pairs across nodes (and across regions, once Cilium
ClusterMesh + Continuum land in #1101).
This Blueprint is not in the bootstrap-kit. The default StorageClass on
existing Sovereigns is local-path (single-node-bound) — installing this
chart with defaultStorageClass: true flips the default to hcloud-volumes,
which would migrate every new PVC. Operators activate deliberately per
Sovereign once the upgrade path for existing PVCs is sized.
What it ships
| Template | Effect |
|---|---|
storageclass.yaml |
StorageClass hcloud-volumes with WaitForFirstConsumer binding (Pod scheduling pins the volume to the right node) and allowVolumeExpansion: true. Optional volumeBindingMode: Immediate override for use cases that need pre-provisioning. |
storageclass-default.yaml |
Annotates hcloud-volumes as the cluster default when .Values.defaultStorageClass: true. |
volumesnapshotclass.yaml |
VolumeSnapshotClass for backup workflows. Default off. |
Upstream hcloud-csi-driver itself is pulled in as a Helm subchart from
hetznercloud/csi-driver (referenced in Chart.yaml). The catalyst overlay
templates above sit alongside it.
Activation contract
# values.yaml override (or per-Sovereign overlay)
enabled: true
defaultStorageClass: true # flip the cluster default
hcloudCsi:
controller:
replicas: 2 # HA for multi-node Sovereigns
storageClasses:
- name: hcloud-volumes
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
When enabled: false (the default), no resources render — installing the
chart is a no-op until the operator opts in.
Why a separate Blueprint, not a values toggle on bp-cilium
CSI drivers are independent of CNI. Mixing them risks coupling the network plane upgrade cycle to the storage plane upgrade cycle. Separate Blueprint keeps the surfaces independent.
Why default-OFF on defaultStorageClass
Flipping the cluster's default StorageClass is a destructive change for
Pods relying on the previous default's volume-binding semantics. Existing
Sovereigns ship with local-path as default; an in-place migration plan
(drain, repvc, copy-data) is its own slice. Keeping defaultStorageClass: false here ensures installing the chart is reversible.
References
- docs/EPICS-1-6-unified-design.md §3.9 row 6
- docs/SRE.md §2.5 (CNPG storage requirements)
- platform/cnpg/README.md §38-58
- Upstream: https://github.com/hetznercloud/csi-driver