Bundles the two halves of the broken ADR-0003 §3.2 NewAPI admin-API hook so the path goes from dormant-and-misconfigured to actually live: 1. catalyst-api Deployment (bp-catalyst-platform) now sets: - CATALYST_NEWAPI_ADDR = "http://newapi-bp-newapi.newapi.svc.cluster.local:3000" (literal — dual-mode Helm+Kustomize contract) - CATALYST_NEWAPI_ADMIN_TOKEN via secretKeyRef on `catalyst-newapi-admin-token` key ADMIN_API_TOKEN (optional:true) 2. bp-newapi ExternalSecret target now carries emberstack/reflector mirror annotations (default reflector-allowed-namespaces = "catalyst-system") so the Secret rendered in the `newapi` namespace is materialised in the catalyst-api Pod's namespace (same cross-namespace seam as sme-secrets / catalyst-gitea-token). 3. main.go default URL fallback corrected from the NXDOMAIN `http://newapi.newapi.svc` to the canonical Service URL `http://newapi-bp-newapi.newapi.svc.cluster.local:3000` (same root cause as TBD-V14 / PR #2017: bp-newapi.fullname renders `<Release.Name>-<Chart.Name>` and bootstrap-kit slot 80 sets `releaseName: newapi` against chart `bp-newapi`). 4. newapi/client.go godoc + main.go comments updated to the correct Service URL. Chart lockstep (Inviolable Principle #14): - bp-newapi 1.4.32 -> 1.4.33 - bp-catalyst-platform 1.4.224 -> 1.4.225 - bootstrap-kit pins both in lockstep. Validation: - go test ./internal/newapi/... ./internal/handler/... PASS - go build ./cmd/api/ PASS - helm template products/catalyst/chart/ renders CATALYST_NEWAPI_ADDR=http://newapi-bp-newapi.newapi.svc.cluster.local:3000 + CATALYST_NEWAPI_ADMIN_TOKEN secretKeyRef on catalyst-newapi-admin-token/ADMIN_API_TOKEN. - kubectl kustomize products/catalyst/chart/templates/ renders the same env vars (dual-mode contract preserved). - helm template platform/newapi/chart/ -s templates/external-secret.yaml --api-versions=external-secrets.io/v1beta1 renders the reflector annotations on target.template.metadata.annotations. Per CLAUDE.md §0 anti-theater discipline this PR uses Refs #2021 (NOT Closes). Issue closes only after a fresh-prov operator walks /console/sme/users -> Add User and observes `sme-users: NewAPI admin client wired` at catalyst-api startup + the row transitions to state=newapi_created (no `newapi client not wired` sentinel, no NXDOMAIN for `newapi.newapi.svc`). Refs #2021 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
340 lines
17 KiB
YAML
340 lines
17 KiB
YAML
# bp-newapi — Catalyst Application Blueprint, bootstrap-kit slot 80.
|
||
# Multi-tenant LLM marketplace gateway. Ships in backend-only mode: the
|
||
# OpenAI-compatible API at api.<sovereign-fqdn>/v1/* is customer-facing;
|
||
# the upstream's portal UI is disabled at ingress; Catalyst replaces it
|
||
# as the customer surface; NewAPI's admin UI at admin.<sovereign-fqdn>
|
||
# is exposed only to ops staff (Keycloak-gated).
|
||
#
|
||
# This slot enables the SME-tenant turnkey experience (epic #795). The
|
||
# Catalyst signup hook (delivered by unified-rbac in #802 against the
|
||
# contract recorded in ADR-0003) reads the `catalyst-newapi-admin-token`
|
||
# Secret rendered by this chart's ExternalSecret to issue per-user API
|
||
# keys against NewAPI's admin API at
|
||
# `http://newapi-bp-newapi.newapi.svc.cluster.local:3000` (canonical
|
||
# in-cluster Service URL — the bp-newapi `<Release.Name>-<Chart.Name>`
|
||
# helper renders `newapi-bp-newapi` for `releaseName: newapi` against
|
||
# chart `bp-newapi`; pre-TBD-V15 / #2021 this comment cited the
|
||
# wrong bare-`newapi` Service name).
|
||
#
|
||
# Wrapper chart: platform/newapi/chart/
|
||
# Catalyst-curated values: platform/newapi/chart/values.yaml
|
||
# Reconciled by: Flux on the new Sovereign's k3s control plane.
|
||
|
||
---
|
||
apiVersion: v1
|
||
kind: Namespace
|
||
metadata:
|
||
name: newapi
|
||
labels:
|
||
catalyst.openova.io/sovereign: ${SOVEREIGN_FQDN}
|
||
---
|
||
apiVersion: source.toolkit.fluxcd.io/v1beta2
|
||
kind: HelmRepository
|
||
metadata:
|
||
name: bp-newapi
|
||
namespace: flux-system
|
||
spec:
|
||
type: oci
|
||
interval: 15m
|
||
url: oci://ghcr.io/openova-io
|
||
secretRef:
|
||
name: ghcr-pull
|
||
---
|
||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||
kind: HelmRelease
|
||
metadata:
|
||
name: bp-newapi
|
||
namespace: flux-system
|
||
labels:
|
||
catalyst.openova.io/slot: "80"
|
||
spec:
|
||
interval: 15m
|
||
releaseName: newapi
|
||
targetNamespace: newapi
|
||
# bp-newapi depends on:
|
||
# - bp-openbao(08): the secret backend the chart's ExternalSecret
|
||
# pulls `ADMIN_API_TOKEN` from. Without OpenBao Ready, the
|
||
# ExternalSecret never resolves and the Catalyst signup hook can't
|
||
# reach the NewAPI admin API.
|
||
# - bp-keycloak(09): the OIDC issuer for the ops-staff admin UI at
|
||
# admin.<sovereign-fqdn>. Without Keycloak Ready, the OIDC
|
||
# middleware can't redirect ops-staff requests.
|
||
# - bp-cnpg(16): operator provisions the Postgres cluster for users,
|
||
# credits, channels, and audit log via a Crossplane
|
||
# PostgresqlInstance claim once cnpg is Ready. The DSN is mounted
|
||
# into NewAPI via `database.existingSecret` (operator-set).
|
||
dependsOn:
|
||
- name: bp-openbao
|
||
- name: bp-keycloak
|
||
- name: bp-cnpg
|
||
chart:
|
||
spec:
|
||
chart: bp-newapi
|
||
# 1.4.0 (issue #943, 2026-05-05): auto-provision CNPG-backed
|
||
# Postgres + chart-emitted SESSION_SECRET/CRYPTO_SECRET so a
|
||
# Sovereign install lands a real Pod without operator intervention.
|
||
# Pre-#943 the Deployment silently skipped render whenever
|
||
# database.existingSecret OR credentials.existingSecret was
|
||
# empty (the bootstrap-kit overlay supplies neither), so NewAPI
|
||
# never came up and alice signup gate 5 (LLM) timed out. Both
|
||
# auto-provisions are capability-gated on bp-cnpg's CRD and
|
||
# operator-overridable per Inviolable Principle #4.
|
||
# 1.3.0: defaultChannels.qwenBankDhofar (channel #1 = Qwen3.6 @
|
||
# https://llm-api.omtd.bankdhofar.com) + post-install/post-upgrade
|
||
# `channel-seed` Helm hook Job that idempotently POSTs default
|
||
# channels into NewAPI's admin API. Issue #915 (epic SME tenant
|
||
# integration DoD: alice → OpenClaw → NewAPI → Qwen3.6@BankDhofar
|
||
# end-to-end).
|
||
# 1.2.0: Traefik Middleware gated behind ingress.middleware.enabled.
|
||
# 1.4.1 (issue #952, 2026-05-05): Pod imagePullSecrets templated +
|
||
# default to `[{name: ghcr-pull}]` so kubelet authenticates pulls
|
||
# of the PRIVATE newapi-mirror + metering-sidecar images. Paired
|
||
# with cloud-init adding `newapi` to flux-system/ghcr-pull's
|
||
# reflector auto-namespaces list.
|
||
# 1.4.2 (qa-loop bounded-cycle audit prov #7 Gap F, 2026-05-10):
|
||
# `.Values.newapi.image.tag` repointed from `v0.4.5` (fictitious —
|
||
# never built by any CI workflow) to `v0.13.2` (actual upstream
|
||
# Calcium-Ion/new-api Docker Hub release, mirrored into
|
||
# ghcr.io/openova-io/openova/newapi-mirror by the new
|
||
# `.github/workflows/build-bp-newapi.yaml` workflow). Pre-1.4.2
|
||
# the NewAPI Pod ImagePullBackOff'd 403 on every fresh Sovereign,
|
||
# blocking alice signup gate 5 (LLM).
|
||
# 1.4.4 (qa-loop bounded-cycle audit prov #20 Fix #138, 2026-05-11):
|
||
# add pre-install/pre-upgrade hook that polls the external-secrets
|
||
# validating-admission webhook until it returns a structured HTTP
|
||
# response — closes the race between bp-external-secrets reaching
|
||
# HR Ready=True and the apiserver-side EndpointSlice for the
|
||
# webhook Service being observable. Pre-1.4.4 the chart's
|
||
# ExternalSecret apply was rejected with `no endpoints available
|
||
# for service "external-secrets-webhook"` on every fresh provision,
|
||
# blocking the chart from reaching Ready and the Catalyst signup
|
||
# hook (ADR-0003 §3.2) from finding the admin-token Secret.
|
||
# 1.4.10 (fix-convergence-wave11, 2026-05-18): gate the
|
||
# defaultChannels.qwenBankDhofar entry on attestation-complete
|
||
# rather than hard-failing the helm template. Pre-1.4.10 the
|
||
# chart raised `commercial-contract attestation requires accountId`
|
||
# on every Sovereign that opted in to marketplace
|
||
# (MARKETPLACE_ENABLED=true) without ALSO supplying a signed
|
||
# commercial contract's `LLM_BANK_DHOFAR_ACCOUNT_ID` /
|
||
# `LLM_BANK_DHOFAR_CONTRACT_REF` envsubst variables. Post-1.4.10
|
||
# the chart silently skips the qwenBankDhofar channel when
|
||
# attestation is incomplete; once the operator overlay supplies
|
||
# the attestation values the channel composes on the next
|
||
# reconcile.
|
||
# 1.4.12 (PR #1677, 2026-05-18): default
|
||
# `.Values.sandboxTokenSigningKey.reflectorNamespaces` flipped
|
||
# from `"sandbox"` → `"catalyst-system,sandbox"`. Pre-1.4.12 the
|
||
# chart-emitted `newapi-bp-newapi-token-signing-key` Secret was
|
||
# mirrored only into a `sandbox` namespace (which does NOT exist
|
||
# on a stock Sovereign — bp-sandbox installs into
|
||
# `catalyst-system` per slot 19a `targetNamespace`); the sandbox-
|
||
# controller's `NEWAPI_ADMIN_SECRET` env var (secretKeyRef
|
||
# `optional: true`) landed EMPTY, the controller silently dropped
|
||
# into gitops-only mode, and zero per-Sandbox LLM-gateway tokens
|
||
# were ever minted (operator-visible only via the controller's
|
||
# `newapi_admin_secret_set=false` startup log). Caught on t22
|
||
# 2026-05-18 (TBD-D14). Bumping the pin pulls the post-#1677
|
||
# default so reflector mirrors into `catalyst-system` too.
|
||
# 1.4.14 (current main, 2026-05-18): latest upstream-tracking
|
||
# chart cut — includes 1.4.12's reflector fix.
|
||
# 1.4.19 (TBD-A12 #1798, 2026-05-18): add startupProbe so kubelet
|
||
# does NOT SIGKILL the binary at the 50s mark while GORM
|
||
# AutoMigrate is still in-flight on the freshly-provisioned empty
|
||
# `newapi` CNPG database. Pre-1.4.19 the empty DB on t22 sat with
|
||
# ZERO tables after 29 CrashLoopBackOff restarts — every kill
|
||
# raced AutoMigrate's first CREATE TABLE call mid-TLS-handshake;
|
||
# pg_stat_activity on the CNPG primary showed no `newapi` user
|
||
# connections because the kill happened before the GORM
|
||
# connection pool's first wire write completed. Probe budget:
|
||
# 30 × 10s = 5 min, comfortably above the observed 60-120s
|
||
# ceiling on cpx21/cpx31 nodes with sslmode=require.
|
||
# TBD-A39 #1834 (2026-05-19): bp-newapi 1.4.27 replaces the
|
||
# Helm-`lookup`-based DSN Secret render (which raced CNPG on
|
||
# first install and committed an empty password — t32 newapi
|
||
# Pod was 21x CrashLoopBackOff with `password authentication
|
||
# failed for user "newapi"`) with a post-install Job that polls
|
||
# `<cluster>-app` and PATCHes the SQL_DSN bytes. Canonical
|
||
# database-secret-sync-job pattern lifted from
|
||
# platform/gitea/chart/templates/database-secret-sync-job.yaml
|
||
# (issue #830 Bug 2) + platform/wordpress-tenant/chart/templates/
|
||
# database-secret-sync-job.yaml (issue #1786).
|
||
# 1.4.29 (TBD-A52 #1944): default Valkey URL was
|
||
# `valkey.valkey.svc.cluster.local` which is NXDOMAIN — the
|
||
# bp-valkey bitnami chart with architecture=replication exposes
|
||
# `valkey-primary` / `valkey-replicas` / `valkey-headless`, not a
|
||
# plain `valkey` Service. Caused 31× CrashLoopBackOff on t34.
|
||
# bp-newapi 1.4.29 ships the corrected
|
||
# `valkey-primary.valkey.svc.cluster.local` default.
|
||
# 1.4.31 (TBD-V21 #2032, 2026-05-20): extend default
|
||
# `sandboxTokenSigningKey.reflectorNamespaces` to include the
|
||
# `sandbox-.*` regex pattern so emberstack/reflector mirrors the
|
||
# SIGNING_KEY Secret into every per-Sandbox namespace. Paired with
|
||
# bp-sandbox 0.3.2 which mounts SIGNING_KEY as the MCP's
|
||
# `SANDBOX_JWT_SECRET` env (closes auth-gate-stays-in-test-mode
|
||
# silent-breakage).
|
||
# 1.4.33 (TBD-V15 #2021, 2026-05-20): catalyst-newapi-admin-token
|
||
# ExternalSecret target now carries reflector mirror annotations
|
||
# (default to `catalyst-system`) so the rendered Secret is
|
||
# available in the catalyst-api Pod's namespace via secretKeyRef.
|
||
# Companion to bp-catalyst-platform 1.4.225 which adds the
|
||
# secretKeyRef itself + the corrected CATALYST_NEWAPI_ADDR
|
||
# literal (`http://newapi-bp-newapi.newapi.svc.cluster.local:3000`).
|
||
version: 1.4.33
|
||
sourceRef:
|
||
kind: HelmRepository
|
||
name: bp-newapi
|
||
namespace: flux-system
|
||
# Event-driven install per docs/INVIOLABLE-PRINCIPLES.md #3 (Flux
|
||
# dependsOn is the gate, not Helm timeout). NewAPI itself starts in
|
||
# ~10 s once the Postgres DSN Secret is present; the long pole is
|
||
# waiting for the operator's Crossplane claim to materialise the DB.
|
||
install:
|
||
timeout: 15m
|
||
disableWait: true
|
||
remediation:
|
||
retries: 3
|
||
upgrade:
|
||
timeout: 15m
|
||
disableWait: true
|
||
remediation:
|
||
retries: 3
|
||
# Per-Sovereign overrides — the operator MUST supply at install time:
|
||
# - ingress.host = api.${SOVEREIGN_FQDN}
|
||
# - ingress.adminHost = admin.${SOVEREIGN_FQDN}
|
||
# - auth.adminUI.keycloak.issuer = https://auth.${SOVEREIGN_FQDN}/realms/ops
|
||
# - database.existingSecret = Postgres DSN Secret (from the
|
||
# Crossplane PostgresqlInstance claim)
|
||
# - credentials.existingSecret = SESSION_SECRET + CRYPTO_SECRET
|
||
# (rotated via OpenBao)
|
||
# - catalystIntegration.externalSecret.remoteRef.key
|
||
# = sovereign/${SOVEREIGN_FQDN}/newapi/admin-token
|
||
# - defaultChannels.vllm.enabled = true (first-otech)
|
||
# - defaultChannels.vllm.endpoint
|
||
# + defaultChannels.vllm.attestation.owner
|
||
#
|
||
# Defaults below wire the first-otech provider channel to the same
|
||
# upstream the OpenOva marketing site uses (Qwen via Axon →
|
||
# `llm-api.omtd.bankdhofar.com`, model `qwen3-coder`); the operator
|
||
# overlay overrides any of these by setting them in this HelmRelease's
|
||
# spec.values.
|
||
values:
|
||
sovereignFQDN: ${SOVEREIGN_FQDN}
|
||
ingress:
|
||
host: api.${SOVEREIGN_FQDN}
|
||
adminHost: admin.${SOVEREIGN_FQDN}
|
||
tls:
|
||
enabled: true
|
||
issuer: letsencrypt-prod
|
||
# Cilium Gateway HTTPRoute for `newapi.<fqdn>` (TBD-D35d, issue
|
||
# #1778). Sandbox runtimes hit the LLM gateway at the URL the
|
||
# sandbox controller mints into their environment
|
||
# (`NEWAPI_BASE_URL=https://newapi.${SOVEREIGN_FQDN}/v1`). Without
|
||
# this HTTPRoute the marketplace `tenant-wildcard` (hostnames=
|
||
# `*.${SOVEREIGN_FQDN}`) absorbs every newapi.${SOVEREIGN_FQDN}
|
||
# request and forwards to the storefront `console` Service —
|
||
# blocking the entire BYOS Claude Code journey at the LLM gate.
|
||
# An exact-hostname HTTPRoute outranks the wildcard per Gateway
|
||
# API spec, so enabling this on every Sovereign restores LLM
|
||
# reachability without touching the marketplace wildcard.
|
||
httpRoute:
|
||
enabled: true
|
||
host: newapi.${SOVEREIGN_FQDN}
|
||
auth:
|
||
adminUI:
|
||
mode: keycloak
|
||
keycloak:
|
||
issuer: https://auth.${SOVEREIGN_FQDN}/realms/ops
|
||
clientId: newapi-admin
|
||
existingSecret: newapi-oidc
|
||
customerAPI:
|
||
keyIssuer: catalyst
|
||
catalystIntegration:
|
||
enabled: true
|
||
existingSecret: catalyst-newapi-admin-token
|
||
externalSecret:
|
||
enabled: true
|
||
refreshInterval: "1h"
|
||
secretStoreRef:
|
||
kind: ClusterSecretStore
|
||
name: vault-region1
|
||
remoteRef:
|
||
# Canonical OpenBao path per docs/INVIOLABLE-PRINCIPLES.md #4.
|
||
# Under the `vault-region1` store's `secret/` mount the full
|
||
# path is `secret/sovereign/<fqdn>/newapi/admin-token`.
|
||
key: sovereign/${SOVEREIGN_FQDN}/newapi/admin-token
|
||
property: ADMIN_API_TOKEN
|
||
# Default channels — chart-side composition (channel #1 first).
|
||
#
|
||
# `qwenBankDhofar` (issue #915) is the canonical first channel:
|
||
# Qwen3.6 hosted at BankDhofar (https://llm-api.omtd.bankdhofar.com,
|
||
# model `qwen3-coder` / alias `qwen3.6`) — the SAME relay the
|
||
# OpenOva marketing site's Axon helmrelease consumes
|
||
# (openova-private/clusters/contabo-mkt/apps/axon/helmrelease.yaml).
|
||
# Disabled in the template so a fresh Sovereign does not silently
|
||
# wire customers to a third-party endpoint; per-Sovereign overlays
|
||
# (clusters/<sovereign>/bootstrap-kit/80-newapi.yaml) enable this
|
||
# block and supply:
|
||
# - defaultChannels.qwenBankDhofar.enabled = true
|
||
# - defaultChannels.qwenBankDhofar.endpoint = https://llm-api.omtd.bankdhofar.com
|
||
# - defaultChannels.qwenBankDhofar.attestation.accountId (legal-team-owned)
|
||
# - defaultChannels.qwenBankDhofar.attestation.contractRef (legal-team-owned)
|
||
# - the Secret `newapi-channel-qwen-bankdhofar` containing the
|
||
# upstream API key under key `API_KEY` (or an ExternalSecret
|
||
# pulling from OpenBao at
|
||
# `sovereign/<sovereign-fqdn>/newapi/channel-qwen-bankdhofar`)
|
||
# - auth.adminUI.masterKeySecret = name of a Secret carrying
|
||
# `MASTER_KEY` (NewAPI bootstrap admin auth) — required for
|
||
# the channel-seed Helm hook Job to POST against the admin API
|
||
# ONCE at install time. Operator may rotate the master key out
|
||
# post-bootstrap; channels persist in Postgres.
|
||
#
|
||
# When the operator flips `qwenBankDhofar.enabled: true`, the
|
||
# chart's post-install/post-upgrade `channel-seed` Job probes
|
||
# NewAPI's admin API (`/api/channel/?keyword=<name>`) and POSTs
|
||
# the channel definition idempotently. Re-runs after upgrades
|
||
# are no-ops once the channel exists.
|
||
#
|
||
# The legacy `vllm` slot (in-cluster vLLM fallback) remains for
|
||
# operators that run their own bp-vllm + open-weight model in-
|
||
# cluster; it composes after `qwenBankDhofar` and any operator
|
||
# `.Values.channels`.
|
||
# Sandbox Wave 4 (2026-05-18, retry of sandbox-wave4-newapi-sovereign-install):
|
||
# qwenBankDhofar is now gated on `${MARKETPLACE_ENABLED:-false}` — the
|
||
# same envsubst variable bp-catalyst-platform (slot 13) reads to flip
|
||
# marketplace.enabled on the Catalyst control plane. This lets a
|
||
# franchised Sovereign with `MARKETPLACE_ENABLED=true` auto-seed the
|
||
# default Bank Dhofar Qwen3.6 channel without the operator having to
|
||
# supply per-Sovereign overlay values. The endpoint defaults to the
|
||
# canonical first-otech relay; `LLM_BANK_DHOFAR_BASE_URL` overrides
|
||
# it (e.g. for staging at https://omtd.bankdhofar.com). The upstream
|
||
# API key MUST be present in the Secret `newapi-channel-qwen-bankdhofar`
|
||
# under key `API_KEY` — either pre-seeded by cloud-init or pulled from
|
||
# OpenBao via the operator's ExternalSecret at path
|
||
# `sovereign/<fqdn>/newapi/channel-qwen-bankdhofar`. Sandbox agents
|
||
# (sandbox-wave4) depend on this channel being live on every Sovereign
|
||
# that opted in to marketplace; without it the agents fall back to
|
||
# mothership newapi, defeating the per-Sovereign sandboxing.
|
||
defaultChannels:
|
||
qwenBankDhofar:
|
||
enabled: ${MARKETPLACE_ENABLED:-false}
|
||
name: qwen3.6-bankdhofar
|
||
endpoint: ${LLM_BANK_DHOFAR_BASE_URL:-https://llm-api.omtd.bankdhofar.com}
|
||
models:
|
||
- qwen3.6
|
||
- qwen3-coder
|
||
existingSecret: newapi-channel-qwen-bankdhofar
|
||
existingSecretKey: API_KEY
|
||
attestation:
|
||
kind: commercial-contract
|
||
accountId: ${LLM_BANK_DHOFAR_ACCOUNT_ID:-}
|
||
contractRef: ${LLM_BANK_DHOFAR_CONTRACT_REF:-}
|
||
vllm:
|
||
enabled: false
|
||
name: qwen
|
||
endpoint: ""
|
||
models:
|
||
- qwen3-coder
|
||
attestation:
|
||
kind: in-cluster
|
||
owner: ${SOVEREIGN_FQDN}
|