The cloud-isolated aquarium — the blueprint thinks it owns the machine.
The quarantine sandbox — internally called the aquarium — is an isolated execution environment where the blueprint believes it has full access to a real machine. In reality, every external resource is a fake: DNS resolves to NXDOMAIN via stub resolvers, HTTP requests hit mock endpoints, the filesystem contains synthetic decoy data, and even the system clock can be manipulated.
The blueprint’s code runs exactly as it would in production, but its environment is a carefully constructed illusion. This lets us observe what the blueprint tries to do — not just what its source code says it does.
Three quarantine tiers are defined, selected automatically based on the blueprint’s declared capabilities:
| TIER | TECHNOLOGY | WHEN USED | ISOLATION LEVEL |
|---|---|---|---|
| T0 | Wasmtime | Pure-compute blueprints with no I/O capabilities | Memory sandbox, no syscalls |
| T1 | CF Containers + Landlock | Blueprints with filesystem or limited network access | Namespace isolation, LSM constraints |
| T2 | Firecracker microVM | Blueprints with shell, broad network, or subagent capabilities | Full VM isolation, dedicated kernel |
Higher tiers provide stronger isolation at higher cost. The tier selector picks the cheapest tier that fully contains the blueprint’s declared capabilities. If a blueprint requests capabilities that span tiers, it is promoted to the higher tier.
Each blueprint is executed twice, in separate aquariums with different decoy sets. Run A sees decoy set A (fake API keys, synthetic user data, mock endpoints). Run B sees decoy set B (different fake keys, different synthetic data, different mocks). The decoy sets are decorrelated — they share no values.
The Execution Observer compares the observation bundles from both runs. A well-behaved blueprint should produce functionally equivalent outputs regardless of the decoy environment. If the blueprint’s behaviour diverges — for example, it exfiltrates data from one environment but not the other, or it produces different outputs depending on the API keys it finds — it is auto-flagged for human review.
This counterfactual design catches adversaries who tailor their behaviour to specific environments (A8 — sandbox detection) and time-bomb payloads (A6) that activate based on environmental signals.
| EVENT TYPE | CAPTURED DATA | FLAG THRESHOLD |
|---|---|---|
| Syscall trace | System calls, arguments, return values | Unexpected syscalls for declared capability set |
| Network | DNS queries, HTTP requests, raw sockets | Any egress attempt (all network is mocked) |
| Filesystem | Reads, writes, permission changes | Access outside declared paths |
| Process | Spawned subprocesses, signals | Undeclared child processes |
| Clock | Time queries, sleep calls | Excessive sleeps (delay-based evasion) |
| Entropy | Random number generation | Crypto operations not matching declared capabilities |