History

e3mrah cf35b4a9b6 fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856 ) (#1858 ) A17 (#1855) hot-patched 6 drifted blueprints (cilium, cert-manager, flux, openbao, keycloak, gitea) where blueprint.yaml spec.version had silently fallen behind chart/Chart.yaml version, breaking TestBootstrapKit_BlueprintCardsHaveRequiredFields. The structural root cause: the TBD-A6 auto-bump hook in blueprint-release.yaml updated only clusters/_template/bootstrap-kit/<N>-<chart>.yaml pins on every chart publish — never the upstream platform/<bp>/blueprint.yaml. This PR extends the auto-bump hook to lockstep platform/<bp>/blueprint.yaml spec.version whenever Chart.yaml version bumps. Both file edits land in the SAME commit (subject becomes `deploy(<chart>): bump bootstrap-kit pin X -> Y (auto, Refs TBD-A6)` with a secondary line noting the blueprint lockstep). Idempotent reset-and-rewrite retry preserved for the existing parallel-matrix race case. Workflow changes (.github/workflows/blueprint-release.yaml): * New step `bump_blueprint` after `bump_pin` — locates ${matrix.path}/blueprint.yaml OR ${matrix.path}/chart/blueprint.yaml (handles both platform-leaf and products-umbrella conventions), filters to kind:Blueprint (defensive against CRD yaml at the products/catalyst/chart/crds path), reads current spec.version at 2-space indent, sed-rewrites to CHART_VERSION, verifies post-write. * Commit step renamed to "Commit + push bootstrap-kit pin bump + blueprint.yaml lockstep"; stages both files, single commit, with convergent retry on conflict. * Summary block surfaces both bumps separately. Regression test (tests/e2e/bootstrap-kit/main_test.go): * New TestBootstrapKit_BlueprintVersionLockstepSweep — walks platform/* and products/, discovers every Blueprint manifest with a sibling Chart.yaml, asserts spec.version == Chart.yaml version. Covers ALL ~70 blueprints, not just the canonical 10 kit ones the existing TestBootstrapKit_BlueprintCardsHaveRequiredFields gates. Failure messages name the file, drift direction, and the exact sed command to fix — drift remediation is mechanical. Drift cleanup (mandatory companion, same shape as A17/#1855): 26 Application-Blueprint blueprints whose spec.version had been left at 1.0.0 / 0.1.0 while Chart.yaml moved forward — synced down to Chart.yaml as authoritative. All currently surface in the new sweep test; without the cleanup the test would block this PR (and every subsequent one). Affected: alloy, cert-manager-{dynadot,powerdns}-webhook, cluster-autoscaler-hcloud, cnpg, crossplane-claims, external-secrets[-stores], falco, grafana, guacamole, harbor, hcloud-csi, k8s-ws-proxy, mimir, netbird, newapi, openclaw, powerdns, seaweedfs, self-sovereign-cutover, trivy, valkey, velero, vpa, products/dmz-vcluster. After this lands, the next chart-version bump in any platform/<bp>/ folder auto-converges all three artifacts (Chart.yaml, blueprint.yaml, bootstrap-kit pin) in a single bot commit. No more manual collector PRs; no more silent drift between chart and Blueprint manifest. Closes #1856. Refs #1855 (A17 hot-patch this replaces structurally), #1713 (original TBD-A6 auto-bump hook). Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-19 01:04:22 +04:00
..
chart	fix(bp-vpa): drop registry.k8s.io/ prefix in repository (upstream prepends it) (#641 )	2026-05-02 23:32:35 +04:00
blueprint.yaml	fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856 ) (#1858 )	2026-05-19 01:04:22 +04:00
README.md	docs(pass-10): banners on 7 more components + opentofu active-active drift fix	2026-04-27 21:43:45 +02:00

README.md

Vertical Pod Autoscaler (VPA)

Automated resource right-sizing. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.4) — runs on every host cluster a Sovereign owns.

Status: Accepted | Updated: 2026-04-27

Overview

VPA provides automated resource optimization:

Reduces over-provisioning waste
Prevents under-provisioning issues
Works alongside horizontal scaling (KEDA)
Provides recommendations even if not auto-applying

Architecture

flowchart TB
    subgraph VPA["VPA Components"]
        Rec[Recommender]
        Upd[Updater]
        Adm[Admission Controller]
    end

    subgraph Metrics["Metrics"]
        MS[Metrics Server]
        Prom[Prometheus/Mimir]
    end

    subgraph Workloads["Workloads"]
        Deploy[Deployments]
        Pods[Pods]
    end

    MS --> Rec
    Prom --> Rec
    Rec -->|"Recommendations"| Upd
    Upd -->|"Evict pods"| Pods
    Adm -->|"Mutate requests"| Pods
    Deploy --> Pods

Update Modes

Mode	Behavior	Use Case
`Off`	Recommendations only	Analysis, not auto-apply
`Initial`	Apply on pod creation	Batch jobs
`Auto`	Evict and recreate	Long-running services
`Recreate`	Same as Auto	Legacy compatibility

Recommended: Auto for most workloads

Configuration

VPA Resource

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: <org>-app-vpa
  namespace: <org>
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: <org>-app
  updatePolicy:
    updateMode: Auto
  resourcePolicy:
    containerPolicies:
      - containerName: "*"
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi
        controlledResources:
          - cpu
          - memory
        controlledValues: RequestsAndLimits

Kyverno Auto-Generation

Kyverno automatically generates VPAs for deployments:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: generate-vpa
spec:
  rules:
    - name: generate-vpa-for-deployment
      match:
        any:
          - resources:
              kinds:
                - Deployment
      exclude:
        any:
          - resources:
              annotations:
                vpa.openova.io/skip: "true"
      generate:
        apiVersion: autoscaling.k8s.io/v1
        kind: VerticalPodAutoscaler
        name: "{{request.object.metadata.name}}-vpa"
        namespace: "{{request.object.metadata.namespace}}"
        data:
          spec:
            targetRef:
              apiVersion: apps/v1
              kind: Deployment
              name: "{{request.object.metadata.name}}"
            updatePolicy:
              updateMode: Auto

VPA + KEDA Interaction

flowchart LR
    subgraph Scaling["Scaling"]
        VPA[VPA<br/>Vertical]
        KEDA[KEDA<br/>Horizontal]
    end

    subgraph Workload["Workload"]
        Deploy[Deployment]
        Pods[Pods]
    end

    VPA -->|"Right-size resources"| Pods
    KEDA -->|"Scale replicas"| Deploy
    Deploy --> Pods

Coordination:

VPA handles CPU/memory per pod
KEDA handles replica count
Combined: optimal resource utilization

Monitoring

Metric	Description
`vpa_recommender_*`	Recommender metrics
`vpa_updater_*`	Updater metrics
`container_resource_recommendations`	Per-container recommendations

Dashboard

Grafana dashboard shows:

Current vs recommended resources
Historical recommendations
Eviction events
Cost savings estimates

Part of OpenOva