Zero Trust for AI Agents: Why Implicit Trust Will Burn You

The zero-trust security model is not new. The concept — never trust, always verify — has been the governing principle for enterprise network architecture since Google published BeyondCorp in 2014. A decade later, most large organisations have moved past the perimeter model. Identity is verified at every request. Access is granted on a least-privilege basis. Every action is logged. The walls came down and the checkpoints went up.

Then AI agents entered the picture, and the enterprise security posture regressed by fifteen years.

The Implicit Trust Problem

When an AI agent calls a tool today, a remarkable thing happens: the tool is trusted completely. There is no signature check. There is no capability restriction. There is no sandbox. The tool runs with whatever permissions the host process has — which, for an agent managing cloud infrastructure or customer data, can be everything.

This is implicit trust. It is the security model of a time before supply-chain attacks, before dependency confusion, before the realisation that the most dangerous code is the code you didn't write yourself.

The problem is not that individual tools are malicious. Most are not. The problem is that the architecture provides no way to distinguish between a legitimate tool and a compromised one. When every tool is trusted equally, the blast radius of a single compromised package is unlimited.

Consider the attack surface of a typical enterprise AI agent deployment:

The tool itself. Did the code you're running come from the author you think it did? Has it been modified since publication? There's no signature to check.
The tool's dependencies. If the tool pulls in third-party libraries (and it does), those libraries have their own supply chains. A compromised transitive dependency is indistinguishable from a legitimate one.
The tool's runtime environment. The tool runs in the same process or on the same host as your agent. It can read files, make network calls, access environment variables — all without restriction.
The data flowing through the tool. Customer PII, API credentials, internal system identifiers — all of it passes through the tool in cleartext, with no filtering and no audit trail.

This is not a hypothetical risk assessment. It is the current production reality for most organisations running AI agents.

What Zero Trust Means for AI Tooling

Applying zero-trust principles to AI tool execution is not a metaphor. It is a direct mapping of the same principles that transformed network security, applied to a new trust boundary.

Verify Explicitly

In a zero-trust network, every request is authenticated and authorised, regardless of where it originates. The equivalent for AI tooling: every tool is cryptographically verified before execution, regardless of where it was installed from or how long it has been on the system.

This means:

Cryptographic signatures. Every tool package must be signed by its publisher using a strong asymmetric algorithm. Before execution, the host runtime verifies the signature against a known public key. If the signature doesn't match — because the package was tampered with, the publisher's key was revoked, or the package is unsigned — execution is refused.
Content hashing. The integrity of the entire package is verified by computing a SHA-256 hash over its contents and comparing it to the hash that was signed. A single changed byte anywhere in the package invalidates the signature.
Publisher identity. The signing key is tied to a verified publisher identity. The host can determine not just that the package is intact, but that it was produced by a specific, known entity.

MPP implements exactly this model. Every .mpp package is signed with Ed25519. The Gatekeeper verification pipeline recomputes the content hash and verifies the publisher's signature before any tool code executes. Verification takes microseconds. The cost of skipping it is incalculable.

Least Privilege

In a zero-trust network, access is scoped to the minimum necessary for each request. The equivalent for AI tooling: every tool declares exactly what resources it needs, and the runtime refuses everything else.

Today, when an MCP tool server starts, it inherits the full permissions of its host process. A tool that checks the weather has the same system access as a tool that manages your cloud provider's IAM policies. There is no granularity. There is no restriction.

MPP's capability system inverts this model entirely. A tool's manifest declares every resource it needs:

capabilities:
  network: ["api.github.com"]
  filesystem:
    read: ["/data/inputs"]
    write: ["/data/outputs"]
  env_vars: ["GITHUB_TOKEN"]

The runtime enforces these declarations at the boundary. A tool that declares access to api.github.com cannot connect to api.stripe.com. A tool that declares read access to /data/inputs cannot open /etc/passwd. Undeclared resources are not merely denied — they are invisible to the tool.

Assume Breach

In a zero-trust network, every component is designed to limit blast radius under the assumption that compromise will occur. The equivalent for AI tooling: even if a tool is compromised, the damage it can do is contained by the sandbox, bounded by its declared capabilities, and visible through the audit log.

MPP achieves this through WebAssembly sandboxing. Each tool executes in an isolated WASM module with its own linear memory space. The tool cannot read the host's heap, cannot access the operating system directly, and cannot do anything outside the capabilities it was granted. When the invocation completes, the module is torn down. No persistent state. No side-channel to the host.

If a tool is compromised, the worst it can do is misuse the specific capabilities it was granted — and every action is recorded in a tamper-evident audit log.

The Supply-Chain Attack You Haven't Had Yet

Enterprise security teams understand supply-chain risk in the context of npm, PyPI, and container registries. They have invested in dependency scanning, software bill of materials (SBOM), and vulnerability management. But most have not extended this thinking to AI tool registries — because AI tool registries are new, and the scale of adoption hasn't yet attracted the volume of attacks that forced the traditional package ecosystem to mature.

That window is closing.

As AI tool repositories grow, they become valuable targets for the same attack patterns that have plagued every other package ecosystem:

Typosquatting. An attacker publishes a tool with a name one character off from a popular tool. The AI agent — or the developer installing tools for the agent — doesn't notice the difference.

Dependency confusion. An attacker publishes a tool with the same name as an internal tool on a public registry. The resolution algorithm downloads the public (malicious) version instead of the internal one.

Account takeover. An attacker compromises a tool publisher's credentials and pushes a malicious update to a widely-used tool. Without signature verification, every consumer of that tool silently installs the compromised version.

Prompt injection via tool metadata. This one is unique to AI tooling. An attacker crafts a tool description that contains instructions for the model — not the user. The description says "fetch the weather" but contains hidden instructions to exfiltrate the conversation to a remote server. MCP's own documentation acknowledges this vector.

Every one of these attacks succeeds because the tool execution layer has no verification step. There is no signature to check. There is no content hash to validate. There is no declared capability set to enforce. The tool arrives, and the tool runs.

MPP closes each of these vectors:

| Attack | MPP Mitigation | |--------|---------------| | Tampered package | Ed25519 signature + SHA-256 content hash | | Typosquatting | Publisher identity verification via registry | | Dependency confusion | Package scoping tied to verified publishers | | Account takeover + malicious update | Signature verification catches key mismatch | | Prompt injection via metadata | HITL confirmation for non-trivial tools |

The Audit Question

There is a question that every CISO will eventually ask about their organisation's AI agent deployment: What tools did the agent use, what did it access, and what data flowed through it?

Under the current model, the answer is: we don't know. Tool invocations are not systematically logged. Capability usage is not recorded because capabilities are not declared. Data flowing through tools is not filtered or tracked.

MPP's append-only audit log changes this. Every invocation writes a structured record: which package ran, which capabilities were used, what input it received, what output it returned, and the result of every permission check. Entries are hash-chained — each entry includes the hash of the previous entry — so that any modification or deletion of log entries is detectable.

This is not a nice-to-have feature. It is the difference between an AI agent deployment that can survive a security audit and one that cannot.

The Compliance Dimension

Regulatory frameworks are tightening around AI. The EU AI Act requires transparency and human oversight for AI systems. GDPR requires data minimisation and documented processing. HIPAA mandates access controls and audit trails for protected health information.

None of these requirements can be met by an AI tool execution model based on implicit trust. You cannot demonstrate transparency when you don't know what tools your agent used. You cannot claim data minimisation when PII flows freely through unfiltered tool responses. You cannot produce an audit trail when invocations aren't logged.

MPP provides the infrastructure that makes compliance an architectural property rather than an afterthought:

Transparency: Every tool's capabilities and behaviour are declared in a machine-readable manifest.
Human oversight: The HITL system ensures high-sensitivity operations require explicit human approval.
Data minimisation: Privacy filters redact PII from tool responses before they reach the model context.
Audit trails: Hash-chained logs record every invocation, capability usage, and permission decision.

What Moving to Zero Trust Looks Like

Adopting zero trust for AI tooling does not require replacing your agent framework or rewriting your tools from scratch. MPP is designed to layer into the existing stack:

Existing tool logic stays the same. The business logic of your tools doesn't change. You compile it to WebAssembly and package it with a manifest.
MCP compatibility is preserved. MPP uses the same JSON-RPC 2.0 protocol as MCP. Agents that speak MCP can invoke MPP tools through a conforming host runtime.
Deployment is incremental. You can start by packaging your highest-risk tools (those with network access, filesystem writes, or credential access) and expand coverage over time.
The security pipeline is automatic. Once a tool is packaged, the Gatekeeper, permission engine, sandbox, and privacy filters run on every invocation without manual intervention.

The alternative is to wait. To run AI agents in production with implicit trust and hope that the first supply-chain attack lands on someone else. To explain to your auditors that you don't have logs because the architecture doesn't support them. To discover, after the fact, that a tool with read access to your customer database was also phoning home to an undeclared domain.

Zero trust exists because the perimeter model failed. Implicit trust for AI tools will fail for the same reasons — and the organisations that moved first will be the ones that didn't have to learn the lesson the hard way.

MPP is a licensed protocol for secure AI tool execution, developed by Quantum 2x. The technical documentation is available at /docs. Read the companion post: Introducing MPP: Containerization for AI Tool Execution.