Pass 9 — six more component READMEs got Catalyst-role banners
matching the rule of thumb in CLAUDE.md (every platform/<x>/README.md
should state its role in Catalyst).
- grafana: observability stack on every host cluster; Catalyst's
own self-monitoring + Application telemetry flows here.
- harbor: per-host-cluster container registry for Catalyst images,
mirrored Blueprint OCI artifacts, customer images.
- falco: runtime security on every host cluster; feeds SIEM/SOAR.
- kyverno: policy engine on every host cluster; enforces Catalyst
policy contracts (cosign on Blueprints, default-deny NetworkPolicies
on Organization namespaces, priority-class injection).
- sigstore: cosign-signed Blueprint OCI artifacts + admission
verification chain on every host cluster.
- syft-grype: SBOM generation in CI per Blueprint + runtime CVE scans.
Plus Kyverno priority-class clarification: prose around `tenant-high`
/ `tenant-default` / `tenant-batch` priority class names now reads
"Organization workloads" instead of "tenant workloads", with an
explicit note that the priority class artifact names themselves stay
as-is until a separate migration ticket renames them in deployed
clusters (renaming PriorityClass objects requires recreate, not
in-place rename).
VALIDATION-LOG: Pass 9 entry added.
Refs #37
Pass 8 — line-by-line read of platform/cnpg, platform/strimzi,
platform/k8gb, platform/keycloak, platform/cert-manager, platform/cilium.
CNPG and Strimzi: read in full and confirmed clean — they correctly
position themselves as Application Blueprints and don't drift from
the canonical model. CNPG's `<org>-postgres-dr` cluster name
(Application-tier database role) is acceptable per NAMING-CONVENTION
§1.3 (which only forbids primary/dr in K8s host-cluster names, not
in Application-internal CRD names).
Four READMEs updated:
k8gb:
- Header reframed: per-host-cluster infrastructure pointer to
PLATFORM-TECH-STACK §3.1 and SRE §2.4 split-brain protection.
- Removed dead link to ../failover-controller/docs/ADR-FAILOVER-
CONTROLLER.md (the failover-controller folder has no docs/);
replaced with link to that component's README + SRE §2.4.
keycloak:
- Header reframed from "FAPI Authorization Server for Open Banking"
(narrow) to "User identity for Catalyst Sovereigns" (broad).
Keycloak handles ALL user identity in Catalyst, not just FAPI.
- Added per-Org / per-Sovereign topology callout matching SECURITY
§6. Clarified that "Multi-tenant TPP" refers to PSD2 Third Party
Providers, not Catalyst's Organization-level multi-tenancy.
- FAPI features kept since Keycloak still serves Fingate as the
FAPI Authorization Server.
cert-manager:
- Header reframed as per-host-cluster infrastructure with pointer
to PLATFORM-TECH-STACK §3.3.
cilium:
- Header reframed as per-host-cluster infrastructure with pointer
to PLATFORM-TECH-STACK §3.1, including the install-first note
(CNI must come before any other workload during Phase 0).
VALIDATION-LOG: Pass 8 entry added.
Refs #37
Continuing Pass 7 cleanup after the OpenBao/ESO rewrite (42aeb62).
Gitea README:
- Was describing "Bidirectional mirroring for multi-region" with two
Gitea instances mirroring repos cross-region. Wrong: Catalyst's
agreed model has one Gitea per Sovereign on the management cluster
(PLATFORM-TECH-STACK §2.3). Replaced the multi-region mirror
diagram with a single-Gitea + intra-cluster HA topology and added
a "Why not cross-region bidirectional mirror" explainer (write-
conflict semantics would break EnvironmentPolicy enforcement).
- Status banner: notes the canonical references.
- Backup section: removed "Repository mirror for redundancy"
(replaced with Velero scheduled backups).
Flux README:
- "Multi-Region GitOps" section was showing one Gitea per region
with bidirectional mirror. Replaced with one Gitea per Sovereign
topology. Per-vcluster Flux pulls from this single Gitea.
Mermaid syntax bug:
- Earlier mass replace_all of "Catalyst IDP" → "Catalyst console"
had left an invalid mermaid node identifier
`Catalyst console[Catalyst console]` (mermaid forbids spaces in
node IDs). Fixed to `Console[Catalyst console]`. Would have
rendered as a broken diagram on GitHub.
VALIDATION-LOG: Pass 7 entry added documenting the OpenBao/ESO
active-active rewrite (the most consequential drift fix in any pass).
Refs #37
Pass 7 — line-by-line read of platform/openbao/README.md and
platform/external-secrets/README.md found a major architectural drift:
both files described an OLD active-active bidirectional sync model
that contradicts docs/SECURITY.md §5 (the canonical reference).
The active-active design was rejected during the architecture session
because it would have been a stretched cluster — a single region's
network blip would block writes everywhere. The agreed model is:
- Independent Raft cluster per region (intra-region quorum only).
- Single-primary writes; replicas accept reads only.
- Async Performance Replication primary → replicas (lag <1s typical).
- Explicit DR promotion (sovereign-admin or failover-controller).
Fixes:
platform/openbao/README.md:
- Overview: removed "active-active deployments" / "either region can
update secrets". Replaced with "independent Raft cluster per region",
"asynchronous Performance Replication".
- Architecture diagram: replaced bidirectional-push diagram with the
primary→replicas async perf replication topology that matches
SECURITY.md §5.
- ClusterSecretStores: simplified from "two stores (local+remote)" to
"one local store"; reads always pull locally.
- Renamed "PushSecret (Bidirectional)" → "Writes go to the primary
region" with a single-target PushSecret pointing at bao-primary.
- Added DR promotion section pointing at SECURITY.md §5.2.
- Status banner: notes that the canonical multi-region reference is
SECURITY.md.
platform/external-secrets/README.md:
- Header line: repositioned as per-host-cluster infrastructure with
pointer to PLATFORM-TECH-STACK §3.3.
- Removed broken link to non-existent ../openbao/docs/ADR-OPENBAO.md
(replaced with link to ../openbao/README.md).
- "Multi-region sync | Push to both OpenBao instances simultaneously"
→ "Multi-region reads | Async perf replication".
- "PushSecret to Multiple OpenBao Instances" example was writing to
two ClusterSecretStores in parallel — replaced with single-target
primary write.
- "Multi-region sync via single PushSecret" in Consequences →
"Cross-region availability via Performance Replication".
- Mermaid sequence diagram: "Bootstrap Wizard" actor → "Catalyst
Bootstrap (Phase 0)"; "Terraform" → "OpenTofu"; ESO connection
description "via K8s auth" → "via SPIFFE SVID (workload identity)".
These were the most consequential drift fixes found in any pass —
two READMEs were documenting an architecture explicitly rejected by
the agreed model.
Refs #37
Pass 6 — fresh-eyes line-by-line read of ARCHITECTURE.md. Found two
internal contradictions that earlier passes missed.
ARCHITECTURE §3 (topology diagram) listed Crossplane, Flux, Harbor,
and grafana-stack INSIDE the Catalyst control plane block. But §11
(Catalyst-on-Catalyst) explicitly says these are per-host-cluster
infrastructure, NOT Catalyst control-plane components. PLATFORM-TECH-
STACK §3 also classifies them as per-host-cluster.
Fixed: §3 topology diagram now shows only true Catalyst control-plane
components (console, marketplace, admin, catalog-svc, projector,
provisioning, environment-controller, blueprint-controller, billing,
gitea, nats-jetstream, openbao, keycloak, spire-server, observability)
and adds a separate line for "Plus per-host-cluster infrastructure"
that defers to PLATFORM-TECH-STACK §3 for the full list (Cilium, Flux,
Crossplane, cert-manager, ESO, Kyverno, Harbor, Reloader, Trivy, Falco,
Sigstore, Syft+Grype, VPA, KEDA, External-DNS, k8gb, Coraza, MinIO,
Velero, failover-controller). Also added the previously-missing
`provisioning` row.
JetStream Account scoping was contradictory:
- ARCHITECTURE §5 said "Per-Org account: ws.{org}-{env_type}.>" —
reads ambiguously: is the Account per-Org or per-Env?
- NAMING-CONVENTION §11.2 said "One JetStream Account scoped to
ws.{org}-{env_type}.>" — implied per-Environment.
- GLOSSARY + PLATFORM-TECH-STACK + SECURITY all say per-Organization.
Reconciled to the per-Org-Account-with-per-Env-subjects model:
- Account isolation: ONE NATS Account per Organization.
- Subjects within the Account use prefix `ws.{org}-{env_type}.>` for
per-Environment partitioning.
This is the cleanest isolation model: Accounts are NATS' strongest
isolation boundary (per-Org); subjects partition further within each
Account (per-Env).
Refs #37
Concluding the validation loop with a process artifact. The new file
records:
- Why the validation existed (post-rewrite trust verification).
- Each pass's scope and concrete fixes (16 iterations across Pass 1
+ sweeps in Passes 2/3/4/5).
- The acceptance criteria as runnable grep commands so any future
contributor can re-verify.
- Authorship convention (hatiyildiz, per-commit identity flags).
- Re-validation cadence (after rewrites, after new banned terms,
after component renames, quarterly drift check).
Linked from README.md docs table.
This file is meant as a playbook for the next validation, not a
status snapshot — for status, IMPLEMENTATION-STATUS.md remains
canonical.
Refs #37
Two user-facing residuals where the banned product term "instance"
slipped through:
- docs/ARCHITECTURE.md §9: example console dialog "Use existing
instance or create a dedicated one?" → "Use an existing Postgres
Application or create a new dedicated one?". This is a UI prompt
text — must use the user-facing noun "Application", not "instance".
- docs/NAMING-CONVENTION.md §6.2 tag comment: "Application instance
name" → "Application name within the Environment". The CRD might
internally still use the noun Instance for class-vs-instance
semantics, but in tag annotations and user-visible context the
Application IS the instance.
Other "instance" occurrences confirmed legitimate (Postgres instance
as Crossplane resource type, Flux instance as software deployment,
EC2/Hetzner instance as cloud-provider terminology) and retained.
Final cross-reference check: all Markdown links across all canonical
docs resolve. No residual banned terms.
Refs #37
ARCHITECTURE §10 listed 3 provisioning phases (Phase 0 / 1 / 2) and
labeled Phase 2 as "Self-sufficient". SOVEREIGN-PROVISIONING.md uses
4 phases (Phase 0 Bootstrap / Phase 1 Hand-off / Phase 2 Day-1 setup
/ Phase 3 Steady-state). The same phase number meant different things
in the two docs.
Aligned ARCHITECTURE to the 4-phase numbering. SOVEREIGN-PROVISIONING
is now explicitly the canonical reference for phase semantics.
Refs #37
PERSONAS-AND-JOURNEYS and SECURITY were using two competing slugs
for the same example Organization:
- "muscat-pharmacy" (with hyphen) — used as Org name + Environment
name in the Ahmed journey narrative.
- "muscatpharmacy" (no hyphen) — used as the vcluster name in the
same paragraph, and used everywhere else (NAMING-CONVENTION
examples, ARCHITECTURE topology diagram, SECURITY SPIFFE ID).
NAMING §2.5 allows both spellings (Org slug regex permits hyphens).
But within a single example the spelling must be stable, otherwise
readers see a contradiction between Org and vcluster names.
Normalized to single-token "muscatpharmacy" throughout (matches the
predominant usage and produces simpler URLs / paths).
Result: all docs now show the same example Org consistently —
muscatpharmacy as Org, muscatpharmacy as vcluster, muscatpharmacy-prod
as Environment, gitea.omantel.openova.io/muscatpharmacy/muscatpharmacy-prod
as Environment Gitea repo.
Refs #37
After the PLATFORM-TECH-STACK reorganization (§2 = Catalyst control
plane, §3 = per-host-cluster infrastructure), IMPLEMENTATION-STATUS
§2 was still mixing the two — listing cilium, k8gb, kyverno, falco,
etc. under "Catalyst control plane components" alongside console,
projector, etc.
Split into:
- §2 (renumbered subsections 2.1, 2.2): Catalyst control plane only —
the per-Sovereign components that make a cluster a Sovereign.
- §2bis: Per-host-cluster infrastructure — the substrate every host
cluster needs (Cilium, Flux, Crossplane, cert-manager, ESO, Kyverno,
Trivy, Falco, Sigstore, Syft+Grype, VPA, KEDA, Reloader, MinIO,
Velero, Harbor, failover-controller).
Status flags retained per component (📐 design / 🚧 README only / ✅
implemented / ⏸ deferred). All per-host-cluster components currently
🚧 (READMEs exist; none yet packaged as deployable Blueprints).
This brings IMPLEMENTATION-STATUS into 1:1 correspondence with the
PLATFORM-TECH-STACK §2 / §3 / §4 categorization that other docs
reference.
Refs #37
Pass 2 — fresh-eyes sweep across the entire docs tree. One residual
entity-noun usage found:
- platform/external-secrets/README.md:75 (in a Mermaid sequence
diagram): "Note over Wizard: Operator saves unseal keys offline"
— "Operator" used as person/entity. Renamed to "sovereign-admin"
to match the role from GLOSSARY.md.
All other banned-term sweeps clean:
- No tenant (architectural) anywhere.
- No Catalyst IDP anywhere.
- No Synapse-as-product anywhere (only the legitimate
"Matrix/Synapse server" usages).
- No workspace-controller (only the banned-term entries that define
the rename).
- No capital-W Workspace as Catalyst scope.
- No github.com/openova (without -io).
- All cross-doc Markdown links resolve.
- All §X references resolve to the new section numbering after
PLATFORM-TECH-STACK reorg.
- API group catalyst.openova.io/v1alpha1 consistent across 6 references.
- OCI artifact prefix `bp-` consistent across README, CLAUDE,
BLUEPRINT-AUTHORING, IMPLEMENTATION-STATUS.
Other "Operator" mentions intentionally retained (legitimate
technical usage):
- "External Secrets Operator (ESO)", "Trivy Operator" — K8s
Operator pattern (controllers), explicitly allowed by GLOSSARY.
- "Operator compatibility" in BUSINESS-STRATEGY's OpenShift migration
table — refers to compatibility with K8s Operators (the technology),
not as an entity/role.
Refs #37
README + CLAUDE.md (iter 9):
- README's "Build a Blueprint" section was contradicting itself: said
"A Blueprint is a Git repo" while elsewhere we'd locked in the
monorepo decision. Rewritten: Blueprint = a folder under
platform/<name>/ or products/<name>/ in this monorepo. CI publishes
per-folder OCI artifacts.
- CLAUDE.md "Repo structure": replaced the brief tree with a more
honest one that distinguishes target structure from current
placeholders (core/apps/ is target console+projector+...; current
has only legacy bootstrap/ and manager/ .gitkeep dirs). Annotated
each products/<name>/ folder with current state (axon = real code;
others = README only; catalyst = bootstrap/ui scaffold).
- CLAUDE.md banned-terms entry "Workspace": now covers component
names too (was only Catalyst scope), matching GLOSSARY's expanded
banned-term entry.
PLATFORM-TECH-STACK (iter 10) — substantive reorganization:
The §1 categorization established three buckets:
(a) Catalyst control plane (per-Sovereign on mgt)
(b) Per-host-cluster infrastructure (every host cluster)
(c) Application Blueprints (a la carte)
But §2 "Catalyst control plane components" was mixing buckets (a)
and (b): it listed flux, crossplane, cert-manager, kyverno, harbor,
external-secrets, reloader, vpa, keda, k8gb, coraza, falco, trivy,
sigstore, syft-grype, minio, velero, failover-controller all under
"Catalyst control plane" — but those are per-host-cluster
infrastructure per §1, and §1 itself said Crossplane "Never
user-facing" / per-host-cluster.
Reorganized §2 + §3:
- §2 now contains ONLY the Catalyst control plane:
2.1 User-facing surfaces (console, marketplace, admin)
2.2 Catalyst backend services (projector, catalog-svc, provisioning,
environment-controller, blueprint-controller, billing)
2.3 Per-Sovereign supporting services (keycloak, openbao, spire-
server, nats-jetstream, gitea, observability)
- New §3 Per-host-cluster infrastructure with subsections for
networking, GitOps+IaC, security+policy, scaling+ops, storage+
registry, resilience.
- Application Blueprints renumbered §3 → §4. Added missing
opensearch row to §4.1 (was previously misplaced in observability).
- Composite Blueprints (Products) §4 → §5.
- Multi-Region §5 → §6. Resource estimates §6 → §7. Cluster
deployment §7 → §8. User choice §8 → §9. SIEM §9 → §10. License §10 → §11.
Cross-doc references to PLATFORM-TECH-STACK §1 / §2 (in NAMING,
ARCHITECTURE, IMPLEMENTATION-STATUS) all still resolve correctly
under the new numbering.
SRE (iter 11):
- §2.4 split-brain table: "MongoDB" → "FerretDB" (MongoDB was
retired in favor of FerretDB-on-CNPG per project-memory).
- §2.5 data replication: clarified each row's layer (Application
Blueprint vs per-host-cluster vs Catalyst control plane) instead
of misclassifying MinIO/Harbor as Application Blueprints. Added
OpenSearch row.
- §3.1 Flagger and §3.2 Flipt: explicitly marked "Status: design,
not yet a deployed Blueprint" since they're "components to watch"
in TECHNOLOGY-FORECAST, not in the current PLATFORM-TECH-STACK §3
inventory.
BUSINESS-STRATEGY + TECHNOLOGY-FORECAST (iter 12):
- Final scan: clean. No tenant/operator-team/Catalyst-IDP/Lifecycle
Manager/Synapse(product) violations remaining.
Refs #37
SECURITY (iter 6):
- "Environment repo" → "Environment Gitea repo" in §3 secrets diagram.
- "ChangePolicy enforces approvals" → "EnvironmentPolicy enforces
approvals" in §9 SOC2 row (ChangePolicy was a fictional CRD —
EnvironmentPolicy is the real one defined in ARCHITECTURE §8).
- "Catalyst's compliance-controller surfaces evidence" → "evidence
surfaced via Catalyst console audit views and SIEM exports"
(compliance-controller wasn't defined elsewhere; this avoids
inventing new components in compliance prose).
SOVEREIGN-PROVISIONING (iter 7):
- "vault-stored" → "stored in OpenBao on the provisioner"
(Vault was replaced by OpenBao; "vault-stored" was generic English
but read as a contradiction).
BLUEPRINT-AUTHORING (iter 8):
- OCI artifact naming locked: `ghcr.io/openova-io/bp-<name>:<semver>`
where `<name>` is the folder name. The `bp-` prefix lives in the
OCI artifact name (self-identifying), not the folder name.
Fixed in §1, §10, §11, §13 — and propagated to README.md so the
pattern is consistent across the repo.
- Crossplane Composition example: `compositeTypeRef.apiVersion`
changed from `bp-wordpress.openova.io/v1alpha1` (per-Blueprint
group, ugly) to `compose.openova.io/v1alpha1` (shared XRD group
across all Blueprints).
- §11 CI pipeline final step: "publish blueprint.yaml as the
manifest" → "as the OCI manifest's metadata layer" (clearer about
what it does in the OCI sense).
Refs #37
GLOSSARY.md line-by-line audit. Eight corrections.
1. workspace-controller → environment-controller everywhere. The
controller reconciles the Environment CRD; "workspace" is banned as
a Catalyst scope, so it cannot be in a component name either. Fixed
in: GLOSSARY, ARCHITECTURE, PLATFORM-TECH-STACK, NAMING-CONVENTION,
SOVEREIGN-PROVISIONING, IMPLEMENTATION-STATUS, core/README,
BUSINESS-STRATEGY. Banned-term entry in GLOSSARY now explicitly
covers component names too.
2. "workspace repos" (per-Environment Gitea repos) → "Environment
Gitea repos" in GLOSSARY, PLATFORM-TECH-STACK.
3. JWT claim {workspace, org, role} → {environment, org, role} in
ARCHITECTURE projector diagram.
4. OpenOva definition refined: was "Never used to name a product",
which contradicted "OpenOva Catalyst", "OpenOva Cortex". Now: brand
prefix in product names; bare "OpenOva" = the company; bare
"Catalyst" = the platform.
5. Catalyst definition completed: was missing provisioning, billing,
gitea, observability — now lists all 14 control-plane components,
pointing at the table below.
6. Catalyst components table: added `provisioning` (validates
configSchema, commits to Environment Gitea); reordered to match
ARCHITECTURE §3 grouping; clarified each component's source-of-truth
(catalog-svc reads monorepo + Gitea, blueprint-controller watches
monorepo + Gitea, etc.).
7. Environment definition: refers to NAMING §2.4 for env_type values;
removed inline list that didn't match canonical ordering. Added
concrete examples (acme-prod, acme-dev, bankdhofar-uat).
8. Application example: dropped "RocketChat" which appeared nowhere
else; replaced with generic "running deployment" plus the
established WordPress / Postgres examples.
9. sovereign-admin description: was "runs Crossplane" — Crossplane is
platform plumbing not user-facing. Now: "manages the underlying
clusters via Crossplane (which is platform plumbing, not a
user-facing surface)".
Banned-term coverage:
- "Workspace" entry now covers BOTH the Catalyst scope AND component
naming (workspace-controller → environment-controller).
Refs #37
First validation iteration. Three concrete corrections.
1. Add docs/IMPLEMENTATION-STATUS.md as the bridge between target
architecture and current code state. Status legend (✅ / 🚧 / 📐 / ⏸)
applied per-component. Catalyst control plane = mostly 📐. Component
READMEs = 🚧 (README only, no Blueprint manifests yet). products/axon
= ✅ (only product with real code). core/ = 📐 (just .gitkeep).
2. Status banner added to ARCHITECTURE, SECURITY, SOVEREIGN-PROVISIONING,
BLUEPRINT-AUTHORING, PERSONAS-AND-JOURNEYS, PLATFORM-TECH-STACK, SRE
pointing readers at IMPLEMENTATION-STATUS.md before they treat any
described feature as built. GLOSSARY also references it.
3. Architectural decision (Option A — monorepo canonical):
- Each platform/<name>/ and products/<name>/ folder is the source of
ONE Blueprint, published as ghcr.io/openova-io/<name>:<semver> by
CI fan-out from the monorepo root.
- BLUEPRINT-AUTHORING.md §1, §2, §13 rewritten to match.
- README.md "what's in this repo" rewritten to clarify monorepo +
OCI-fan-out shape; no longer claims every directory is a Blueprint
in a way that contradicts BLUEPRINT-AUTHORING.
Wrong-org fixes (3 places):
- docs/PERSONAS-AND-JOURNEYS.md:13 github.com/openova → openova-io
- docs/BLUEPRINT-AUTHORING.md:13 github.com/openova → openova-io
- docs/BLUEPRINT-AUTHORING.md:404 github.com/openova → openova-io
- docs/BLUEPRINT-AUTHORING.md ghcr.io/openova/* (3 refs) → openova-io
API group consistency:
- All references unified to catalyst.openova.io/v1alpha1
(was mixed v1 / v1alpha1; v1alpha1 is correct since the CRDs are
design-stage with no implementation).
core/README.md updated to honestly describe the directory tree as
"target structure with .gitkeep placeholders" rather than implying
the apps/console, apps/projector, etc. binaries already exist.
The legacy apps/bootstrap and apps/manager directories are
acknowledged as transitional placeholders that will be removed when
the new apps/ layout is scaffolded.
CLAUDE.md and .claude/project-memory.md updated to put
IMPLEMENTATION-STATUS.md second in the read-first ordering.
Refs #37
Targeted updates to BUSINESS-STRATEGY.md §5.1 and §9.2 plus
TECHNOLOGY-FORECAST §removed-components.
- BUSINESS-STRATEGY.md §5.1: OpenOva Catalyst row repositioned. It is
the platform itself (the self-sufficient Kubernetes-native control
plane that turns any cluster into a Sovereign), not a sub-product
bundling bootstrap+IDP+lifecycle manager. Other OpenOva products
(Cortex, Fingate, Fabric, Relay, Specter, Axon) run ON Catalyst as
composite Blueprints.
- BUSINESS-STRATEGY.md §9.2: capability matrix "Developer portal" cell
updated from "Catalyst IDP" to "Catalyst console" — IDP function is
one of the console's responsibilities, not a separate product.
- TECHNOLOGY-FORECAST.md §removed-components: Backstage row updated to
describe replacement as "Catalyst console (the platform's own
developer-facing UI)" rather than the now-retired "Catalyst IDP"
sub-product.
Strategy narrative, market segmentation, pricing model, and migration
playbook are unchanged — they stand on their own.
Refs #37
Two related rewrites that put the control plane / application Blueprint
distinction front and center.
PLATFORM-TECH-STACK.md
- §1: explicit three-way component categorization — Catalyst control
plane (one per Sovereign), per-host-cluster infrastructure (every
cluster), Application Blueprints (inside per-Org vclusters).
- §2: Catalyst control plane components listed by responsibility —
user-facing surfaces, backend services, identity, secrets, event
spine, GitOps, networking, security, scaling, storage,
observability, resilience.
- §3: Application Blueprints (the a-la-carte catalog) — Valkey and
Strimzi explicitly callout that they are Application Blueprints,
NOT control-plane components (control plane uses NATS JetStream).
- §4: composite Blueprints (Cortex, Axon, Fingate, Fabric, Relay)
repositioned as Applications running ON Catalyst, not as parallel
products.
- §5: multi-region diagram showing independent OpenBao Raft per
region, NATS leaf nodes, Crossplane on mgt.
- §6: resource estimates updated for control plane (~12 GB +
per-Org Keycloak in SME tier).
- §10: license posture table — every control-plane component carries
a redistribution-safe license (no BSL).
SRE.md
- §2: multi-region principles updated; explicit "no stretched
clusters" applies to OpenBao, JetStream, etcd, every quorum-
based component.
- §2.5: data replication patterns now scoped to Application
Blueprints (the things a customer installs), separate from
control-plane patterns documented in SECURITY.md and
ARCHITECTURE.md.
- §4: alert-to-action mapping segmented by Catalyst control plane
vs per-product (Cortex, Fingate); new alerts: OpenBaoSealed,
JetstreamLagHigh.
- §7-§13: terminology aligned to Catalyst (console instead of IDP);
runbooks now Runbook CRD-backed; incident severities updated.
- §13.2-13.3: Catalyst-specific incidents (workspace-controller,
OpenBao seal, projector lag) plus AI Hub incidents under
bp-cortex installation.
Refs #37
Repositions the public repo's identity. OpenOva is the company; Catalyst
is the platform. Sovereign is a deployed Catalyst. The historical
positioning (OpenOva = platform, Catalyst = bootstrap+IDP+lifecycle
sub-product) is retired. Catalyst now subsumes bootstrap, lifecycle, and
IDP responsibilities into one control plane.
- README.md Catalyst-first front door. Sovereign concept,
repo structure, stack at a glance, cloud
provider matrix, getting-started paths
(managed via marketplace.openova.io vs
self-host via catalyst-provisioner).
- CLAUDE.md Codebase guide for Claude. Banned-term table,
commit conventions (hatiyildiz default for
public repo), the no-fourth-surface rule,
per-component README rule of thumb.
- .claude/project-memory.md Reduced to an index + decision log;
full architecture moved to docs/. Stack
decisions locked (NATS JetStream, OpenBao,
SPIFFE/SPIRE, per-Org Keycloak SME / per-
Sovereign corporate, Crossplane only IaC,
no Terraform/Pulumi user-facing surface).
- core/README.md Catalyst control-plane Go application. Drops
the bootstrap-vs-manager split (both fold under
"Catalyst control plane"). Lists each component
deployable from this codebase: console,
marketplace, admin, projector, catalog-svc,
provisioning, workspace-controller, blueprint-
controller, billing. CRD list updated:
Sovereign / Organization / Environment /
Application / Blueprint / EnvironmentPolicy /
SecretPolicy / Runbook.
Refs #37
The naming convention pre-dates vcluster and Catalyst's user-facing
Environment object. Three additions, one rename:
- §2.4: {env} dimension renamed to {env_type} to disambiguate from the
Catalyst Environment object (which is the user-facing scope, not a
dimension).
- §2.5: new Organization dimension (slug, lowercase, hyphenated). Used
for vcluster identity and any Organization-scoped resource.
- §4.7: new vcluster naming layer. Pattern is just {org} within the
parent host cluster (Don't Repeat the Parent — Principle 1.2). Globally-
qualified form is {prov}-{reg}-{bb}-{env_type}-{org} for cross-cluster
references and kubeconfig contexts.
- §11: Catalyst Environment defined as the user-facing {org}-{env_type}
scope. One Environment is realized by N vclusters across regions × bb
filtered by Application Placement. Each Environment has its own Gitea
repo and JetStream Account.
Tags updated: openova.io/environment → openova.io/env-type for
disambiguation; new openova.io/organization, openova.io/vcluster,
openova.io/environment (for Catalyst scope), openova.io/sovereign tags.
DNS pattern §5 split into two: control-plane (component.{location-code}.
{sovereign-domain}) and Application (app.{environment}.{sovereign-or-org-
domain}) — supporting white-label Sovereigns where the Application DNS
uses the customer's own domain.
Refs #37
Client sends `thinking: true` to enable reasoning tokens. Default remains
disabled for instant streaming.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Qwen3-coder generates hundreds of `reasoning` tokens before `content`
tokens, causing 10+ second perceived delay. The reasoning tokens stream
through Axon but the ChatWidget only renders `delta.content`, so users
see a long pause then a burst. Passing `enable_thinking: false` via
chat_template_kwargs skips the reasoning phase entirely.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3-turn conversations passed at ~9120 chars but 4-turn failed at ~10640.
WAF anomaly threshold is between those values. Lowered all limits to keep
multi-turn conversations well under the threshold.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WAF anomaly scoring accumulates across the entire request body. After 2-3 turns,
assistant responses containing infrastructure terms (security, scanning, etc.)
push the total past the threshold. Added per-assistant trim (1500 chars) and a
12000-char sliding window that drops oldest messages.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
vLLM requires system messages to be at the beginning. When Axon merges
conversation history with new messages, duplicate system messages cause
a 400 error. Strip all but the first system message.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The vLLM backend at Bank Dhofar runs behind an Istio/Envoy WAF with
ModSecurity-style anomaly scoring. The ChatWidget's 41KB system prompt
accumulates enough infrastructure/security keywords to trigger a 403.
Trim system messages to 6000 chars (70% head + 30% tail) before
forwarding to vLLM — preserves identity/behavior instructions at the
start and FAQ/response guidelines at the end.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clients (e.g. ChatWidget) send OpenAI model names like gpt-4o-mini which
vLLM doesn't recognize. The provider now queries available models on
startup and remaps any unrecognized name to the configured default.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduces a provider abstraction so Axon can proxy to either Claude SDK
(existing behavior) or a vLLM-compatible endpoint. Toggled via
AXON_PROVIDER env var ("claude" | "vllm"). When vllm, requests pass
through as-is (no prompt translation), session pool and OAuth are skipped.
Closesopenova-io/openova#36
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Part of the brand-consistency sweep (openova-private#116). All brand
surfaces must use the swirl mark with gradient #3B82F6 → #818CF8.
- public/favicon.svg: replaced with canonical mark (was a purple
placeholder logo)
- OOLogo.tsx: default c1 coerced from #38BDF8 → #3B82F6 to match the
brand/logo-mark.svg canonical gradient
- AuthLayout.tsx: OctagonAlert in blue-square placeholder → OOLogo;
label 'OpenOva Corporate' → 'OpenOva Sovereign'
- AppLayout.tsx: same — OctagonAlert → OOLogo; 'Corporate' → 'OpenOva Sovereign'
Part of the tier-URL cutover (openova-private issue #116). Catalyst UI moves
from catalyst.openova.io to console.openova.io/sovereign.
- vite.config.ts: base '/sovereign/' so built HTML/asset refs are prefixed
- src/app/router.tsx: TanStack Router basepath '/sovereign' so Link/navigate
emit /sovereign-prefixed URLs
- src/shared/config/urls.ts (NEW): central BASE / API_BASE / path() helper
- StepReview.tsx: fetch('/api/v1/deployments') → fetch(`${API_BASE}/v1/deployments`)
and window.location.href = '/provision.html' → path('provision.html')
- StepCredentials.tsx: same treatment for /api/v1/credentials/validate
- nginx.conf SPA fallback simplified to try_files $uri /index.html — avoids
nginx's directory trailing-slash redirect that would strip the /sovereign
prefix client-side
No more hardcoded URLs in the source (see feedback_never_hardcode_urls rule).
Footer used to be rendered inside StepShell — every step navigation
unmounted the old footer and mounted a new one, which produced a
visible flicker (the 'bounce' the user reported).
Architecture change:
- New shared/lib/wizardNav.ts — tiny Zustand store holding the active
step's nav handlers (onNext, onBack, disabled, loading, label, title)
- StepShell now publishes its nav state via useEffect; renders no footer
- WizardLayout renders ONE persistent footer that reads from wizardNav
Result: footer DOM is stable across step transitions. Buttons swap
behavior synchronously (no remount, no fade-in flicker). Stepper
counter and progress pill stay in place too.
User: 'I don't think anyone will read that section. If you believe
the info is required, put it at the bottom.'
Matching SME's pattern now:
- Compact 1.1rem title with subtle border-bottom divider
- Description paragraph removed from the top
- Description re-rendered at the BOTTOM of the step content as muted
helper text (0.82rem, dashed border-top), so users who want the
context can still read it after completing the main task
- Reclaims ~70-90 px of vertical space above the step content
User requested Corporate inherit every polish element where SME is
better. Changes (all minimum-touch):
Palette (globals.css):
- Accent channel: sky-400 (56,189,248) → SME blue-500 (59,130,246)
- Light-mode accent: sky-600 (2,132,199) → SME blue-600 (37,99,235)
- New --wiz-success-ch: SME emerald (16,185,129 dark / 5,150,105 light)
- Unifies green dots and blue pills with SME on sight
WizardLayout.tsx:
- Stepper circle 'done' state: #22C55E → rgba(var(--wiz-success-ch))
- Stepper separator 'done': same
- Driven entirely by CSS vars — light/dark flips automatically
_shared.tsx (StepShell flat + SME footer):
- Removed outer card wrapper (no bg, no border, no shadow, no blur)
- Content flows flat — child cards are the only surfaces now
- Nav buttons moved to a sticky bottom footer (like SME):
• Back: transparent outline, SME border style
• Continue: solid SME accent, subtle shadow, no gradient
- Backdrop-blur on footer matches the header
- Loading spinner inline in Continue button
User preferred the previous approach — stepper as its own centered
row below the header, matching SME's current pattern exactly.
Reverts the in-header stepper change; restores WizardLayout.tsx to
the state from commit 7ed9239.
User wanted the stepper to live in the header area to reclaim
vertical space, not as a separate row below. Now:
- Header: 3-zone grid (logo · stepper · actions) in one row
- Stepper: inline pills (row: circle + label side-by-side)
- Active pill has accent bg + ring; done shows green circle with check
- Responsive: labels hide below 980px, pill strips to compact dots
- Phone: header reflows to 2 rows (logo/actions + stepper below)
- All chrome fits in ~56 px of header height total
User request: unify both wizards on the horizontal pattern and bring
Corporate in line with SME's look-and-feel (dark/light mode, colors,
cards) with minimum changes.
Minimum-touch changes:
- globals.css: flatten --wiz-page-bg from radial gradient to solid
#0b1220 (dark) / #f8fafc (light) — matches SME's flat bg.
--wiz-panel-bg bumped to #111827 (dark) / #ffffff (light) to match
SME card surfaces.
- WizardLayout.tsx: complete rewrite as a horizontal top-stepper
(header + stepper row + content), mirroring the SME stepper pattern
(32px numbered circles + labels below + 44px connecting lines).
Done circles turn green with a check; active is accent blue with a
soft ring; pending stays as a hollow circle.
- Responsive: labels hide below 720px, circles shrink to 28px so 6
steps remain legible on tablets and phones.
Step content components (StepOrganisation, StepTopology, ...) are
unchanged — they inherit the new palette via the existing --wiz-*
variables.
User feedback: 1km gap between balls and main card, and vertical spacing
between balls was too tight at 22px.
- Body padding-left 40px → 8px (desktop)
- Content wrapper: margin: 0 auto → marginLeft: 0 (left-align to hug
the sidebar; card right edge now rests against the balls)
- Desktop step gap: 22 → 28 (+27%)
- Tablet step gap: 18 → 24 (+33%)
- Content maxWidth 960 → 1000 to fill the extra breathing room
User feedback: previous revision brought back a subtle sidebar pane
(tint + right border) which was wrong direction. Also gaps between
balls stretched to fill full viewport height, making spacing excessive.
Redesign:
- Sidebar width 260 → 200 px, NO bg, NO border (fully transparent)
- Fixed 22 px gap between balls — no more flex:1 stretch
- Stepper right-aligned within sidebar so balls sit flush against
the main content card (tight visual proximity, as requested)
- Labels rendered LEFT of balls (one word each — dropped the
two-line title+description pattern)
- Logo also right-aligned to match direction
- Progress bar compact at the bottom, right-aligned
- Tablet variant: icon-only balls, same transparent + centered pattern
Previous redesign killed sidebar bg entirely — content read as left-aligned
because the left 260px was visually empty (no counterbalance).
Also: pending rails used --wiz-border-sub (rgba 255/0.06) for a dashed
pattern, which rendered as invisible. User reported 'no lines between
balls when not selected'.
Fixes:
- Sidebar: subtle tint rgba(var(--wiz-ch), 0.015) + thin right border
rgba(var(--wiz-ch), 0.08). Enough weight to balance the page without
returning to 'menu' feel.
- Rail thickness: 1.5px → 2px for cleaner rendering
- Pending rail: solid rgba(var(--wiz-ch), 0.2) instead of invisible
dashed. Always visible regardless of state.
- Border radius 1px on rails for softer edges.
- Applied consistently to desktop and tablet variants.
Sidebar:
- Removed distinct bg + border + backdrop-filter that made it read as a menu
- Added vertical connecting rail between step circles (solid gradient for
done/current, dashed grey for pending) — clearly signals journey, not nav
- Distributed steps with flex: 1 grow on each item so the rail fills
the full viewport height instead of clustering at top
- Active step circle has a soft pulse ring animation
- Progress bar integrated at rail's end (no hard divider)
- Same rail pattern applied to tablet variant
Rename (user-facing only — internal codename stays "catalyst"):
- index.html title: OpenOva Catalyst → OpenOva Corporate
- WizardLayout logo sub-label: Catalyst → Corporate
- AuthLayout brand text: OpenOva Catalyst → OpenOva Corporate
- AppLayout sidebar label: Catalyst → Corporate
- LoginPage subtitle: "Catalyst account" → "Corporate account"
Not renamed (internal): store names, CSS vars, repo paths, k8s namespace,
catalyst.openova.io domain — avoids SEO/DNS/infra churn.
Replace the live-SSE phase+log view with a static DAG animation page
at /provision.html. Launch OpenOva now redirects there via
window.location. The old React ProvisionPage and /provision route are
removed. Backend POST /api/v1/deployments still fires so the API side
is unchanged; only the rendered provisioning view is swapped.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Skip refresh gracefully when .credentials.json doesn't exist (e.g. CI
smoke test with no Claude auth mounted).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Claude Agent SDK does not refresh OAuth tokens. Axon now:
1. Refreshes the token on startup before creating session pool
2. Runs a periodic refresh every 4 hours
3. Writes refreshed credentials to disk so session subprocesses use them
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Claude Agent SDK does not handle OAuth token refresh. Adds a CronJob
(every 4h) that refreshes the token via Anthropic's OAuth endpoint and
updates the K8s secret. Disabled by default.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>