Case Study: Multi-Tenant Agent

One client per machine.Fleet operations with hard boundaries.

We redesigned deployment so each client runs isolated on its own machine with standardized state directories, code-only release artifacts, and fresh target provisioning. Then we removed a dangerous fallback path, remediated leaked data, and shipped a hardening pass with 18 bug fixes plus sensitive-section stripping.

Snapshot

What changed in the platform

1:1
One-client-per-machine architecture, no mixed-tenant runtime
100%
Code-only deploy rule, no secrets, no state, no memory transfer
18
Hardening fixes shipped in one pass
0
Accepted cross-tenant leakage paths after remediation
Architecture

One-client-per-machine with standardized state layout

Each deployment target now maps to exactly one client runtime. We standardized state directories so every machine follows the same file contract for logs, runtime artifacts, and local operational data. This removed ambiguity in fleet operations and made incident response deterministic.

  • Dedicated host per client workload
  • Consistent state directory schema across all machines
  • No shared process memory or tenant-level fallback lookup
  • Operational scripts aligned to the same path structure
Deploy Flow

End-to-end deployment flow after hardening

Step 1
Build release bundle from source code only

Deploy artifacts now include code and deterministic config templates only. Secrets, runtime state, and memory files are excluded by rule.

Step 2
Provision target environment from clean baseline

The target machine receives fresh tokens and local configuration during provisioning. Nothing sensitive is copied from a controller machine.

Step 3
Install gateway with platform-native service flow

Gateway install and start are handled with the standard install path. No custom plist detours. This reduced drift and startup failures.

Step 4
Run validation and leakage checks

Post-deploy checks confirm tenant path isolation, token locality, and absence of fallback reads from controller-linked data.

Critical Fixes

Removed risky fallback behavior and remediated exposure

Dangerous fallback removed

A legacy fallback path could read controller-side data when target state was missing. That path was removed to enforce strict local-only data resolution per tenant machine.

cleanup-leaked-data remediation

We ran cleanup-leaked-data remediation to purge affected artifacts and reset machine state where needed. This aligned existing environments to the new isolation guarantees.

Sensitive-section stripping

Sensitive blocks are now stripped from deploy-bound outputs. The release pipeline no longer moves sections that can expose private operational context.

Hardening pass shipped

The hardening milestone included 18 bug fixes focused on deployment correctness, state safety, and predictable fleet behavior under failure conditions.

Field Lessons

First real deployment on the Cosmo machine

The first full real-world deployment on the Cosmo machine surfaced practical rollout lessons: enforce standard gateway lifecycle commands, avoid custom service wrappers, keep provisioning idempotent, and validate tenant boundaries before enabling full workload automation. These lessons were folded into the default runbook for future machines.

Result: deployment became repeatable, safer to operate at fleet scale, and easier to audit.

Security Impact

Client isolation is now a default property, not a best effort.

With one-client-per-machine boundaries, code-only deploys, local fresh provisioning, and fallback removal, cross-tenant data leakage paths were eliminated from the deployment model.

Build a hardened deployment model with us