A17 (#1855) hot-patched 6 drifted blueprints (cilium, cert-manager, flux, openbao, keycloak, gitea) where blueprint.yaml spec.version had silently fallen behind chart/Chart.yaml version, breaking TestBootstrapKit_BlueprintCardsHaveRequiredFields. The structural root cause: the TBD-A6 auto-bump hook in blueprint-release.yaml updated only clusters/_template/bootstrap-kit/<N>-<chart>.yaml pins on every chart publish — never the upstream platform/<bp>/blueprint.yaml. This PR extends the auto-bump hook to lockstep platform/<bp>/blueprint.yaml spec.version whenever Chart.yaml version bumps. Both file edits land in the SAME commit (subject becomes `deploy(<chart>): bump bootstrap-kit pin X -> Y (auto, Refs TBD-A6)` with a secondary line noting the blueprint lockstep). Idempotent reset-and-rewrite retry preserved for the existing parallel-matrix race case. Workflow changes (.github/workflows/blueprint-release.yaml): * New step `bump_blueprint` after `bump_pin` — locates ${matrix.path}/blueprint.yaml OR ${matrix.path}/chart/blueprint.yaml (handles both platform-leaf and products-umbrella conventions), filters to kind:Blueprint (defensive against CRD yaml at the products/catalyst/chart/crds path), reads current spec.version at 2-space indent, sed-rewrites to CHART_VERSION, verifies post-write. * Commit step renamed to "Commit + push bootstrap-kit pin bump + blueprint.yaml lockstep"; stages both files, single commit, with convergent retry on conflict. * Summary block surfaces both bumps separately. Regression test (tests/e2e/bootstrap-kit/main_test.go): * New TestBootstrapKit_BlueprintVersionLockstepSweep — walks platform/* and products/*, discovers every Blueprint manifest with a sibling Chart.yaml, asserts spec.version == Chart.yaml version. Covers ALL ~70 blueprints, not just the canonical 10 kit ones the existing TestBootstrapKit_BlueprintCardsHaveRequiredFields gates. * Failure messages name the file, drift direction, and the exact sed command to fix — drift remediation is mechanical. Drift cleanup (mandatory companion, same shape as A17/#1855): 26 Application-Blueprint blueprints whose spec.version had been left at 1.0.0 / 0.1.0 while Chart.yaml moved forward — synced down to Chart.yaml as authoritative. All currently surface in the new sweep test; without the cleanup the test would block this PR (and every subsequent one). Affected: alloy, cert-manager-{dynadot,powerdns}-webhook, cluster-autoscaler-hcloud, cnpg, crossplane-claims, external-secrets[-stores], falco, grafana, guacamole, harbor, hcloud-csi, k8s-ws-proxy, mimir, netbird, newapi, openclaw, powerdns, seaweedfs, self-sovereign-cutover, trivy, valkey, velero, vpa, products/dmz-vcluster. After this lands, the next chart-version bump in any platform/<bp>/ folder auto-converges all three artifacts (Chart.yaml, blueprint.yaml, bootstrap-kit pin) in a single bot commit. No more manual collector PRs; no more silent drift between chart and Blueprint manifest. Closes #1856. Refs #1855 (A17 hot-patch this replaces structurally), #1713 (original TBD-A6 auto-bump hook). Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| chart | ||
| blueprint.yaml | ||
| README.md | ||
Vertical Pod Autoscaler (VPA)
Automated resource right-sizing. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.4) — runs on every host cluster a Sovereign owns.
Status: Accepted | Updated: 2026-04-27
Overview
VPA provides automated resource optimization:
- Reduces over-provisioning waste
- Prevents under-provisioning issues
- Works alongside horizontal scaling (KEDA)
- Provides recommendations even if not auto-applying
Architecture
flowchart TB
subgraph VPA["VPA Components"]
Rec[Recommender]
Upd[Updater]
Adm[Admission Controller]
end
subgraph Metrics["Metrics"]
MS[Metrics Server]
Prom[Prometheus/Mimir]
end
subgraph Workloads["Workloads"]
Deploy[Deployments]
Pods[Pods]
end
MS --> Rec
Prom --> Rec
Rec -->|"Recommendations"| Upd
Upd -->|"Evict pods"| Pods
Adm -->|"Mutate requests"| Pods
Deploy --> Pods
Update Modes
| Mode | Behavior | Use Case |
|---|---|---|
Off |
Recommendations only | Analysis, not auto-apply |
Initial |
Apply on pod creation | Batch jobs |
Auto |
Evict and recreate | Long-running services |
Recreate |
Same as Auto | Legacy compatibility |
Recommended: Auto for most workloads
Configuration
VPA Resource
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: <org>-app-vpa
namespace: <org>
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: <org>-app
updatePolicy:
updateMode: Auto
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2
memory: 4Gi
controlledResources:
- cpu
- memory
controlledValues: RequestsAndLimits
Kyverno Auto-Generation
Kyverno automatically generates VPAs for deployments:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: generate-vpa
spec:
rules:
- name: generate-vpa-for-deployment
match:
any:
- resources:
kinds:
- Deployment
exclude:
any:
- resources:
annotations:
vpa.openova.io/skip: "true"
generate:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
name: "{{request.object.metadata.name}}-vpa"
namespace: "{{request.object.metadata.namespace}}"
data:
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: "{{request.object.metadata.name}}"
updatePolicy:
updateMode: Auto
VPA + KEDA Interaction
flowchart LR
subgraph Scaling["Scaling"]
VPA[VPA<br/>Vertical]
KEDA[KEDA<br/>Horizontal]
end
subgraph Workload["Workload"]
Deploy[Deployment]
Pods[Pods]
end
VPA -->|"Right-size resources"| Pods
KEDA -->|"Scale replicas"| Deploy
Deploy --> Pods
Coordination:
- VPA handles CPU/memory per pod
- KEDA handles replica count
- Combined: optimal resource utilization
Monitoring
| Metric | Description |
|---|---|
vpa_recommender_* |
Recommender metrics |
vpa_updater_* |
Updater metrics |
container_resource_recommendations |
Per-container recommendations |
Dashboard
Grafana dashboard shows:
- Current vs recommended resources
- Historical recommendations
- Eviction events
- Cost savings estimates
Part of OpenOva