Quarantine Sandbox

The cloud-isolated aquarium — the blueprint thinks it owns the machine.

QUARANTINE SANDBOX — THE AQUARIUM

MODEL

The aquarium model

The quarantine sandbox — internally called the aquarium — is an isolated execution environment where the blueprint believes it has full access to a real machine. In reality, every external resource is a fake: DNS resolves to NXDOMAIN via stub resolvers, HTTP requests hit mock endpoints, the filesystem contains synthetic decoy data, and even the system clock can be manipulated.

The blueprint’s code runs exactly as it would in production, but its environment is a carefully constructed illusion. This lets us observe what the blueprint tries to do — not just what its source code says it does.

TIERS

Three tiers

Three quarantine tiers are defined, selected automatically based on the blueprint’s declared capabilities:

TIER	TECHNOLOGY	WHEN USED	ISOLATION LEVEL
T0	Wasmtime	Pure-compute blueprints with no I/O capabilities	Memory sandbox, no syscalls
T1	CF Containers + Landlock	Blueprints with filesystem or limited network access	Namespace isolation, LSM constraints
T2	Firecracker microVM	Blueprints with shell, broad network, or subagent capabilities	Full VM isolation, dedicated kernel

Higher tiers provide stronger isolation at higher cost. The tier selector picks the cheapest tier that fully contains the blueprint’s declared capabilities. If a blueprint requests capabilities that span tiers, it is promoted to the higher tier.

COUNTERFACTUAL RUNS

Counterfactual runs

Each blueprint is executed twice, in separate aquariums with different decoy sets. Run A sees decoy set A (fake API keys, synthetic user data, mock endpoints). Run B sees decoy set B (different fake keys, different synthetic data, different mocks). The decoy sets are decorrelated — they share no values.

The Execution Observer compares the observation bundles from both runs. A well-behaved blueprint should produce functionally equivalent outputs regardless of the decoy environment. If the blueprint’s behaviour diverges — for example, it exfiltrates data from one environment but not the other, or it produces different outputs depending on the API keys it finds — it is auto-flagged for human review.

This counterfactual design catches adversaries who tailor their behaviour to specific environments (A8 — sandbox detection) and time-bomb payloads (A6) that activate based on environmental signals.

OBSERVATION

What the observer watches

EVENT TYPE	CAPTURED DATA	FLAG THRESHOLD
Syscall trace	System calls, arguments, return values	Unexpected syscalls for declared capability set
Network	DNS queries, HTTP requests, raw sockets	Any egress attempt (all network is mocked)
Filesystem	Reads, writes, permission changes	Access outside declared paths
Process	Spawned subprocesses, signals	Undeclared child processes
Clock	Time queries, sleep calls	Excessive sleeps (delay-based evasion)
Entropy	Random number generation	Crypto operations not matching declared capabilities