openova/platform/trivy
e3mrah cf35b4a9b6
fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858)
A17 (#1855) hot-patched 6 drifted blueprints (cilium, cert-manager, flux,
openbao, keycloak, gitea) where blueprint.yaml spec.version had silently
fallen behind chart/Chart.yaml version, breaking
TestBootstrapKit_BlueprintCardsHaveRequiredFields. The structural root
cause: the TBD-A6 auto-bump hook in blueprint-release.yaml updated only
clusters/_template/bootstrap-kit/<N>-<chart>.yaml pins on every chart
publish — never the upstream platform/<bp>/blueprint.yaml.

This PR extends the auto-bump hook to lockstep platform/<bp>/blueprint.yaml
spec.version whenever Chart.yaml version bumps. Both file edits land in
the SAME commit (subject becomes `deploy(<chart>): bump bootstrap-kit pin
X -> Y (auto, Refs TBD-A6)` with a secondary line noting the blueprint
lockstep). Idempotent reset-and-rewrite retry preserved for the existing
parallel-matrix race case.

Workflow changes (.github/workflows/blueprint-release.yaml):
  * New step `bump_blueprint` after `bump_pin` — locates
    ${matrix.path}/blueprint.yaml OR ${matrix.path}/chart/blueprint.yaml
    (handles both platform-leaf and products-umbrella conventions),
    filters to kind:Blueprint (defensive against CRD yaml at the
    products/catalyst/chart/crds path), reads current spec.version at
    2-space indent, sed-rewrites to CHART_VERSION, verifies post-write.
  * Commit step renamed to "Commit + push bootstrap-kit pin bump +
    blueprint.yaml lockstep"; stages both files, single commit, with
    convergent retry on conflict.
  * Summary block surfaces both bumps separately.

Regression test (tests/e2e/bootstrap-kit/main_test.go):
  * New TestBootstrapKit_BlueprintVersionLockstepSweep — walks
    platform/* and products/*, discovers every Blueprint manifest with
    a sibling Chart.yaml, asserts spec.version == Chart.yaml version.
    Covers ALL ~70 blueprints, not just the canonical 10 kit ones the
    existing TestBootstrapKit_BlueprintCardsHaveRequiredFields gates.
  * Failure messages name the file, drift direction, and the exact sed
    command to fix — drift remediation is mechanical.

Drift cleanup (mandatory companion, same shape as A17/#1855):
  26 Application-Blueprint blueprints whose spec.version had been left
  at 1.0.0 / 0.1.0 while Chart.yaml moved forward — synced down to
  Chart.yaml as authoritative. All currently surface in the new sweep
  test; without the cleanup the test would block this PR (and every
  subsequent one). Affected: alloy, cert-manager-{dynadot,powerdns}-webhook,
  cluster-autoscaler-hcloud, cnpg, crossplane-claims, external-secrets[-stores],
  falco, grafana, guacamole, harbor, hcloud-csi, k8s-ws-proxy, mimir,
  netbird, newapi, openclaw, powerdns, seaweedfs, self-sovereign-cutover,
  trivy, valkey, velero, vpa, products/dmz-vcluster.

After this lands, the next chart-version bump in any platform/<bp>/ folder
auto-converges all three artifacts (Chart.yaml, blueprint.yaml,
bootstrap-kit pin) in a single bot commit. No more manual collector PRs;
no more silent drift between chart and Blueprint manifest.

Closes #1856.
Refs #1855 (A17 hot-patch this replaces structurally), #1713 (original TBD-A6 auto-bump hook).

Co-authored-by: hatiyildiz <hatiyildiz@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 01:04:22 +04:00
..
chart fix(bp-trivy): node-collector tolerates control-plane taint (closes #769) (#772) 2026-05-04 17:38:29 +02:00
blueprint.yaml fix(ci): blueprint.yaml spec.version lockstep in auto-bump (Closes #1856) (#1858) 2026-05-19 01:04:22 +04:00
README.md docs(pass-32): registry-DNS sweep — harbor.<domain> across 9 component READMEs 2026-04-27 22:36:39 +02:00

Trivy

Image and IaC vulnerability scanning. Per-host-cluster infrastructure (see docs/PLATFORM-TECH-STACK.md §3.3) — runs in CI for Blueprint scans, in Harbor for registry scans, and at runtime via Trivy Operator on every host cluster.

Status: Accepted | Updated: 2026-04-27


Overview

Trivy provides unified security scanning at multiple levels: CI/CD, registry, and runtime.

flowchart LR
    subgraph CI["CI/CD Pipeline"]
        Code[Code] --> Scan1[Trivy Scan]
        Scan1 --> Build[Build Image]
    end

    subgraph Registry
        Build --> Harbor
        Harbor --> Scan2[Trivy Scan]
    end

    subgraph Runtime["Kubernetes"]
        Harbor --> Deploy[Deploy]
        TO[Trivy Operator] --> Scan3[Continuous Scan]
    end

Scanning Levels

Level Integration Trigger
CI/CD Gitea Actions On push/PR
Registry Harbor (built-in) On push
Runtime Trivy Operator Continuous

Scanning Capabilities

Target Command
Container images trivy image
Kubernetes manifests trivy config
IaC (Terraform) trivy config
SBOM generation trivy sbom
Secrets detection trivy fs --scanners secret

Harbor Integration

Harbor includes Trivy scanning. Images are automatically scanned on push.

sequenceDiagram
    participant CI as CI/CD
    participant H as Harbor
    participant T as Trivy
    participant K as Kubernetes

    CI->>H: Push image
    H->>T: Trigger scan
    T->>H: Return vulnerabilities
    alt Critical vulnerabilities
        H-->>CI: Block deployment
    else Clean
        H->>K: Allow pull
    end

Scan Policies

Severity CI/CD Action Harbor Action
Critical Fail build Block pull
High Warn Allow (configurable)
Medium Info Allow
Low Info Allow

Trivy Operator

Continuous runtime scanning in Kubernetes:

apiVersion: aquasecurity.github.io/v1alpha1
kind: VulnerabilityReport
# Generated automatically for each workload

Installation

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: trivy-operator
  namespace: trivy-system
spec:
  interval: 10m
  chart:
    spec:
      chart: trivy-operator
      version: "0.20.x"
      sourceRef:
        kind: HelmRepository
        name: aqua
        namespace: flux-system
  values:
    trivy:
      ignoreUnfixed: true
    operator:
      scanJobsConcurrentLimit: 5

CI/CD Integration

Gitea Actions

name: Security Scan
on: [push, pull_request]

jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Scan filesystem
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'

      - name: Scan Kubernetes manifests
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'config'
          scan-ref: './k8s'
          severity: 'CRITICAL,HIGH'

Kyverno Policy

Block deployment of vulnerable images:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: block-vulnerable-images
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-vulnerabilities
      match:
        any:
          - resources:
              kinds:
                - Pod
      verifyImages:
        - imageReferences:
            - "harbor.<location-code>.<sovereign-domain>/*"
          attestations:
            - type: https://cosign.sigstore.dev/attestation/vuln/v1
              conditions:
                - all:
                    - key: "{{ scanner }}"
                      operator: Equals
                      value: "trivy"
                    - key: "{{ criticalCount }}"
                      operator: LessThanOrEquals
                      value: "0"

Monitoring

Key Metrics

Metric Query
Vulnerability count trivy_vulnerability_id
Critical vulns count(trivy_vulnerability_id{severity="CRITICAL"})
Scan status trivy_image_vulnerabilities

Alerts

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: trivy-alerts
  namespace: monitoring
spec:
  groups:
    - name: trivy
      rules:
        - alert: CriticalVulnerabilityFound
          expr: count(trivy_vulnerability_id{severity="CRITICAL"}) > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Critical vulnerability detected"

Consequences

Positive:

  • Unified scanning across CI/CD, registry, and runtime
  • Integrated with Harbor (mandatory component)
  • Shift-left security with fast feedback
  • SBOM generation for compliance

Negative:

  • False positives require triage
  • Scan time adds to CI/CD pipeline
  • Operator resources in cluster

Part of OpenOva