openova/platform/stalwart-tenant
e3mrah 0a45a790e7
fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909)
PR #1888 (TBD-A30) fixed catalyst-system HTTPRoutes for multi-zone
Sovereigns whose Cilium Gateway renames HTTPS listeners from `https` to
`https-<sanitised-zone>` (e.g. `https-omani-works`, `https-omani-homes`)
when more than one parent zone is enabled. Every public HTTPRoute pinned
to `sectionName: https` got `Accepted=False NoMatchingListener` and the
hosted service 404'd / connection-refused.

That fix only touched products/catalyst/chart. Per-blueprint HTTPRoutes
shipped the same `sectionName: https` default in values.yaml, so on a
multi-zone Sovereign every blueprint route — gitea, grafana, harbor,
keycloak, newapi, openbao, powerdns, stalwart-tenant — silently failed
to attach. TBD-A40 / issue #1902.

Sweep verbatim:

  $ git grep -nE 'sectionName:[[:space:]]+(https|"https")[[:space:]]*$' \
      platform/*/chart/ products/ clusters/ core/ 2>/dev/null \
      | grep -v 'platform/gateway-api/chart/templates'
  platform/gitea/chart/values.yaml:168:    sectionName: https
  platform/grafana/chart/values.yaml:124:    sectionName: https
  platform/harbor/chart/values.yaml:437:    sectionName: https
  platform/keycloak/chart/values.yaml:482:    sectionName: https
  platform/newapi/chart/values.yaml:721:      sectionName: https
  platform/openbao/chart/values.yaml:72:    sectionName: https
  platform/powerdns/chart/values.yaml:407:      sectionName: https
  platform/stalwart-tenant/chart/values.yaml:297:      sectionName: https
  products/catalyst/bootstrap/api/internal/handler/sme_tenant_gitops.go:802:        sectionName: https

Fix (Option C — omit sectionName, same as PR #1888):

  - 8 blueprint values.yaml defaults flipped from `sectionName: https` to
    `sectionName: ""`. The chart templates already guard with `{{- with
    .Values.gateway.parentRef.sectionName }}`, so a blank value drops the
    field entirely and Cilium Gateway matches by hostname filter.

  - platform/newapi/chart/templates/httproute.yaml was the outlier: it
    used `default "https" $parent.sectionName` which fell back to `https`
    even when values.yaml said empty. Rewritten to `{{- with
    $parent.sectionName }}` so empty drops the field — same pattern as
    the other 7 blueprints.

  - products/catalyst/bootstrap/api/internal/handler/sme_tenant_gitops.go
    renders a per-tenant bp-keycloak HelmRelease and injected
    `sectionName: https` into spec.values. Flipped to `sectionName: ""`
    so the bp-keycloak chart's `{{- with }}` guard drops the field.

Validation (real `helm template`, default values, gateway enabled, no
sectionName override) — Principle #15:

  gitea            : sectionName lines in rendered output = 0
  grafana          : sectionName lines in rendered output = 0
  harbor           : sectionName lines in rendered output = 0
  keycloak         : sectionName lines in rendered output = 0
  openbao          : sectionName lines in rendered output = 0
  powerdns         : sectionName lines in rendered output = 0
  newapi           : sectionName lines in rendered output = 0
  stalwart-tenant  : sectionName lines in rendered output = 0

Override path preserved — `--set ...parentRef.sectionName=https-omani-works`
on each chart renders `sectionName: "https-omani-works"` correctly,
so operators on single-zone clusters or non-Cilium gateways can still
pin explicitly via bootstrap-kit overlay.

helm lint clean on all 8 blueprint charts (newapi cnpg-cluster.yaml lint
error is pre-existing on origin/main, unrelated to this fix).

Chart bumps (each blueprint also bumps blueprint.yaml spec.version per
#817 lockstep):
  bp-gitea            1.2.7  -> 1.2.8
  bp-grafana          1.0.1  -> 1.0.2
  bp-harbor           1.2.17 -> 1.2.18
  bp-keycloak         1.4.5  -> 1.4.6
  bp-newapi           1.4.22 -> 1.4.23
  bp-openbao          1.2.16 -> 1.2.17
  bp-powerdns         1.2.3  -> 1.2.4
  bp-stalwart-tenant  0.1.2  -> 0.1.3

Refs TBD-A40.

Co-authored-by: hatiyildiz <hatice.yildiz@openova.io>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-19 07:57:12 +04:00
..
chart fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
blueprint.yaml fix: omit HTTPRoute sectionName across blueprint charts — match PR #1888 pattern (Closes #1902) (#1909) 2026-05-19 07:57:12 +04:00
README.md feat(bp-stalwart-tenant): per-SME dedicated mail server v0.1.0 (#801) (#815) 2026-05-04 22:22:46 +04:00

bp-stalwart-tenant

Per-SME (per-vcluster) dedicated Stalwart mail server. Implements locked decision [Q3] of EPIC #795 — every SME on a Sovereign gets its own Stalwart in its tenant namespace, with its own domain, own MTA reputation, and own queue.

Status: v0.1.0 (Application Blueprint, scratch chart) | Updated: 2026-05-04 (#801)

NOT the same as the otech-shared openova.io Stalwart in openova-private/clusters/contabo-mkt/apps/stalwart/ — that is the OpenOva-corp mail server. This Blueprint is the per-SME-tenant mail server that ships inside each SME vcluster.


Why per-tenant (and the trade-off)

Locked in #795: founder explicitly chose this over a shared otech-level multi-domain Stalwart. The trade buys:

  • Stronger isolation — one SME's deliverability problem doesn't affect another SME's MTA reputation.
  • Per-customer DKIM — each SME signs with their own key on their own domain.
  • Per-customer queue — bounce-floods, blocklist hits, rate-limit pushes from one SME stay in their queue.

Cost: mail-server resources multiply by N tenants. Each install = 1 small StatefulSet (100m / 256Mi requests) + 1 PVC (default 20Gi). #795 trade-off table tracks this.


What ships

Resource Purpose
StatefulSet Stalwart pod, single replica, RocksDB on PVC
Service (×3) LoadBalancer for SMTP/submission/submissions, LoadBalancer for IMAP/IMAPS, ClusterIP for webmail/JMAP
HTTPRoute or Ingress webmail UI at mail.<domain> (Cilium Gateway by default; Traefik fallback)
ConfigMap (config) Stalwart bootstrap config.toml — applied when RocksDB is empty
ConfigMap (dns-records-required) MX/SPF/DKIM/DMARC the SME admin must publish — surfaced by unified-rbac UI
ExternalSecret (admin) Pulls Stalwart admin password from OpenBao
ExternalSecret (oidc) Pulls Keycloak client secret from OpenBao
Job (post-install) Bootstraps admin principal + send-allow row (idempotent)
NetworkPolicy Default-deny + explicit allows for SMTP/IMAP/webmail/Keycloak/PowerDNS/DNS/outbound SMTP
ServiceAccount Identity for the Stalwart pod and the setup Job

SSO via SME-vcluster Keycloak

The Stalwart webmail authenticates users against the SME's per-vcluster Keycloak realm — NOT the otech-level Keycloak.

The OIDC client stalwart is registered in the SME realm at vcluster provisioning time (handled by #804 — tenant provisioning pipeline). The client secret is written to OpenBao at the canonical path:

sovereign/<sovereign-fqdn>/stalwart/<tenant>/oidc → property OIDC_CLIENT_SECRET

The chart's oidc-externalsecret.yaml pulls it down into the SME tenant namespace.

Per-user mailbox provisioning is event-driven (per ADR-0003 §3): when the SME admin creates a user via the unified-rbac console, the unified-rbac service POSTs Stalwart's /api/principal admin API to create the mailbox. This chart ships only the bootstrap admin principal in the post-install Job — it does not loop on the NATS subject by default. Per-tenant overlays may flip mailboxProvisioner.natsSubscriber.enabled=true once the SME vcluster's NATS subject is wired.


Domain modes

Free-subdomain mode (default)

Operator overlay sets domain.primary: <slug>.<otech-fqdn> (e.g. acme.omantel.omani.works). The chart records the required DNS records in the *-dns-records-required ConfigMap and a follow-up controller (in unified-rbac) posts them to the otech PowerDNS API.

BYO domain mode

Operator overlay sets domain.primary: acme.com and domain.mode: byo. The records ConfigMap is still emitted; the unified-rbac console UI surfaces them to the SME admin to paste into their public DNS provider. Smoke test in #804 asserts the records are reachable post-creation.


Required DNS records (rendered into the ConfigMap)

Kind Name Value template
MX <domain> priority 10 → mail.<domain>
TXT <domain> v=spf1 mx <policy> (default -all = hard fail)
TXT <selector>._domainkey.<domain> v=DKIM1; k=ed25519; p=<DKIM-PUBLIC-KEY> (the public-key blob is stamped in by the unified-rbac controller after first-boot DKIM mint)
TXT _dmarc.<domain> v=DMARC1; p=reject; rua=mailto:dmarc@<domain> (operator-tunable)

Stalwart config.toml gotchas

The bootstrap config.toml follows the pattern committed by the openova-private contabo-mkt Stalwart, with two memory-recorded gotchas:

  1. == not = in expression matchers (queue routing, sieve conditions, send-allow expressions). A single = is assignment and silently never matches (incident 2026-04-14, huawei.com TLS rule). Every comparison in templates/config-configmap.yaml uses ==. Per-tenant overlays adding queue-routing rules MUST follow the same convention. See stalwart_expression_syntax.md memory.

  2. Group principals need explicit email-receive — Stalwart group principals do NOT inherit email-receive from the default user role. Without it, every inbound email to the group bounces with 550 5.5.0 This account is not authorized to receive email. (incident 2026-04-20). The post-install Job's PATCH on the admin principal is the canonical fix; future shared-mailbox additions in tenant overlays MUST PATCH the same field. See stalwart_send_as.md memory.

The bootstrap config.toml is applied only once — when RocksDB is empty (first install). Subsequent runtime config edits via webadmin or stalwart-cli persist in RocksDB and do not sync back to the ConfigMap. For disaster recovery, snapshot the running configuration via stalwart-cli server list-config and re-render this ConfigMap.


Inbound spam filtering

Disabled by default per the founder directive on the corp Stalwart (feedback_no_spam_filtering.md memory) — accept everything, filter at the client. Per-SME deployments inherit the same posture; individual SMEs may opt in via webadmin runtime config.


Required values (per-tenant overlay)

# clusters/<sovereign>/sme-overlays/<tenant>/stalwart.yaml
domain:
  primary: "acme.omantel.omani.works"   # or "acme.com" for BYO
  mode: "free-subdomain"                 # or "byo"

keycloak:
  realmURL: "https://auth.acme.omantel.omani.works/realms/sme"
  clientID: "stalwart"
  clientSecretName: "stalwart-oidc"
  oidcExternalSecret:
    remoteRef:
      key: "sovereign/omantel.omani.works/stalwart/acme/oidc"

admin:
  externalSecret:
    remoteRef:
      key: "sovereign/omantel.omani.works/stalwart/acme/admin"

dns:
  powerdns:
    enabled: true
    apiURL: "https://pdns.omantel.omani.works/api"
    apiKeySecretName: "powerdns-api-key"
    zone: "omantel.omani.works"
  dmarc:
    rua: "dmarc@acme.omantel.omani.works"

Capacity

Default per-tenant: 100m / 256Mi requests, 1 CPU / 1Gi limits, 20Gi PVC. Roughly 50 mailboxes / 5 GB mail spool comfortably; bump stalwart.resources and persistence.spool.size per-tenant for larger SMEs. Single replica per tenant — Stalwart RocksDB is single-writer by design at this tier.


  • EPIC #795 — SME-tenant turnkey experience
  • #796 — Hook contract (ADR-0003)
  • #802 — Unified RBAC SME-tier (consumes the dns-records ConfigMap)
  • #804 — Tenant provisioning pipeline (registers OIDC client + writes secrets)
  • #805 — End-to-end demo

Part of OpenOva