Audit Logs That Actually Work: Tamper-Evident Logging for AI Agent Actions

There is a question that surfaces in every enterprise AI deployment, usually after the third or fourth incident meeting: Can you tell me exactly what the agent did?

Not "roughly what happened." Not "the agent called some tools." The question is specific: which tool, which version, which publisher, what input, what output, what permissions were granted, who approved the execution, and when — down to the second.

Under current AI agent architectures, the answer is almost always no. Tool invocations are not systematically logged. When they are logged, they capture application-level output (what the tool printed) but not operational metadata (what the tool was allowed to do). And the logs themselves are ordinary files or database rows — mutable, deletable, and carrying no guarantee that what the auditor reads is what actually happened.

MPP's audit logging is designed to answer the question that traditional logging cannot.

What's Wrong with Traditional Logging

Enterprise logging infrastructure is mature. Teams have ELK stacks, Splunk instances, CloudWatch, Datadog. They capture application logs, access logs, system logs. But these systems were designed for debugging and monitoring — not for producing evidence about autonomous agent actions.

The gaps become apparent when you try to use traditional logs for AI agent governance:

Gap 1: What Was Logged Is Not What Matters

Traditional application logs capture what the tool outputs — stdout, stderr, maybe structured JSON messages. They do not capture:

Package identity. Which specific signed artifact ran? What was its content hash? Was the signature valid?
Permission decisions. What capabilities were requested? What policy decision was made? Was a human involved?
Capability enforcement. Which network domains, filesystem paths, and environment variables were actually available to the tool?
Privacy filter actions. Was PII detected in the response? Which patterns matched? What was redacted?
Attestation chain. Was an attestation token issued? When did it expire? Was the nonce tracked for replay prevention?

Without this metadata, you have logs that say "tool ran at 14:32" but cannot answer "was the tool verified, what could it access, and was PII protected?"

Gap 2: Logs Are Mutable

Standard log files and database entries can be modified. An attacker who compromises a system can delete or alter log entries to cover their tracks. A well-intentioned administrator can accidentally truncate a log file. A misconfigured rotation policy can discard entries before they are needed.

For operational monitoring, mutability is acceptable — you care about the current state, not historical proof. For compliance and incident response, mutability undermines the entire purpose of logging. If you cannot prove that the log entries have not been tampered with, the logs have limited evidentiary value.

Gap 3: No Structured Schema for AI Actions

Every tool produces different log output. There is no standard schema for "an AI agent invoked a tool." Aggregating and querying across thousands of tool invocations from different publishers, with different log formats, requires custom parsing and normalisation — work that is fragile and error-prone.

How MPP Audit Logging Works

MPP's audit log is designed around three properties: completeness (every invocation is logged with full operational metadata), immutability (entries are hash-chained so tampering is detectable), and structure (every entry follows a defined schema that can be queried programmatically).

What Gets Logged

Every tool invocation produces a structured audit record containing:

| Field | Description | |-------|-------------| | timestamp | ISO 8601 timestamp of the invocation | | package_id | Unique identifier of the tool package | | package_version | SemVer version of the package | | content_hash | SHA-256 hash of the verified package contents | | signature_valid | Whether the Ed25519 signature was valid | | publisher_key_id | Key ID of the publisher (pub_...) | | tool_name | Which tool within the package was invoked | | security_level | Declared security level (low/medium/high/critical) | | sensitivity_score | Computed sensitivity score (0–100) | | confirmation_level | Required confirmation level (none/notify/confirm/multifactor) | | policy_decision | Granted, RequiresConfirmation, or Denied | | hitl_approved | Whether a human approved the invocation (if applicable) | | hitl_approver | Identity of the approver (if applicable) | | capabilities_requested | List of capabilities declared in the manifest | | capabilities_granted | List of capabilities actually granted by the token | | attestation_id | Unique ID of the attestation token | | attestation_expires | Expiry timestamp of the attestation token | | input_hash | SHA-256 hash of the tool input (not the input itself, for privacy) | | output_hash | SHA-256 hash of the tool output (post-privacy-filtering) | | privacy_filters_applied | List of privacy patterns that matched and redacted data | | execution_time_ms | Wall-clock execution time | | memory_used_bytes | Peak memory consumption of the WASM module | | exit_status | Success or error (with error code) | | previous_entry_hash | Hash of the previous log entry (chain link) |

This is not application logging. This is operational metadata that captures everything the compliance team, the security team, and the incident responder need to reconstruct what happened and why.

Hash Chaining

The critical property that distinguishes MPP's audit log from a conventional log file is hash chaining. Each log entry includes the SHA-256 hash of the previous entry:

Entry 1: { data, previous_hash: null }
    hash_1 = SHA-256(Entry 1)

Entry 2: { data, previous_hash: hash_1 }
    hash_2 = SHA-256(Entry 2)

Entry 3: { data, previous_hash: hash_2 }
    hash_3 = SHA-256(Entry 3)

This creates a linked chain where modifying any entry — changing its data, inserting a new entry, or deleting an entry — breaks the chain. The hash of the modified entry will not match the previous_entry_hash stored in the next entry.

Detecting tampering is a simple linear scan:

For each entry after the first:
    recomputed_hash = SHA-256(previous_entry)
    if recomputed_hash != entry.previous_entry_hash:
        TAMPER DETECTED at entry N

This is the same principle used in blockchain technology and in systems like Git (which uses hash-linked objects for its commit graph). The difference is that MPP's audit log is append-only and sequential — there is no consensus mechanism, no distributed agreement, and no performance overhead. It is a linked list of hashes, and it is fast.

What Hash Chaining Does Not Provide

Hash chaining makes tampering detectable, not impossible. An attacker with write access to the log can still modify entries — but the modification will be visible to anyone who verifies the chain. This is the same guarantee as a tamper-evident seal on a package: it doesn't prevent opening, but it proves the package was opened.

For stronger guarantees, the audit log can be:

Replicated to a separate system (SIEM, write-once storage, a remote logging service) so that an attacker would need to compromise multiple systems.
Anchored by periodically publishing the latest chain hash to an external timestamping service or blockchain, creating an independent witness that the chain was intact at that point.
Access-controlled so that the systems writing log entries cannot also delete them.

Answering the Questions That Matter

With MPP's audit log, you can answer the questions that matter to each stakeholder:

For the Security Team

"Was the tool verified before it ran?" Check signature_valid and content_hash. The log records whether the Gatekeeper passed the package and what its cryptographic identity was.

"What could the tool access?" Check capabilities_granted. The log records the exact set of capabilities — specific network domains, filesystem paths, and environment variables — that were available to the tool during execution.

"Did anyone approve this?" Check hitl_approved, hitl_approver, and confirmation_level. The log records whether a human was in the loop, who they were, and what confirmation level was required.

For the Compliance Team

"Can you prove that PII was handled correctly?" Check privacy_filters_applied. The log records which PII patterns (email, phone, SSN, credit card, custom patterns) were detected and redacted in the tool's response. Combined with output_hash, you can verify that the post-filtering output is what reached the agent.

"Was the principle of least privilege enforced?" Compare capabilities_requested against capabilities_granted. The log shows what the tool asked for and what it actually received — and the policy decision that bridged the two.

"Do you have a complete audit trail?" Verify the hash chain. If every previous_entry_hash matches the recomputed hash of the preceding entry, the log is intact and complete. No entries have been modified, inserted, or deleted.

For Incident Response

"What exactly happened at 14:32?" Query by timestamp. The log gives you the complete operational context for any invocation: which package ran, what version, who signed it, what capabilities were granted, what input it received (hashed for privacy), and what output it produced.

"Was this tool compromised?" Check the content_hash against known-good values. If a tool's content hash changed between invocations without a corresponding version update, something modified the package.

"How far does the blast radius extend?" Query by package_id and capabilities_granted. The log tells you every invocation of the suspect tool and the exact set of resources each invocation had access to. The blast radius is bounded by the union of capabilities granted across all invocations.

Integration with Enterprise Logging Infrastructure

MPP's audit log is designed to integrate with existing enterprise logging pipelines, not replace them. The structured records can be:

Forwarded to a SIEM (Splunk, Sentinel, Elastic Security) for correlation with other security events. An alert that fires when a tool from an untrusted publisher is granted network access can be built from the signature_valid, publisher_key_id, and capabilities_granted fields.

Fed into compliance reporting (SOC 2, ISO 27001, HIPAA). The privacy_filters_applied field provides evidence for data protection controls. The hitl_approved field provides evidence for human oversight. The hash chain provides evidence for log integrity.

Stored in write-once storage (S3 with Object Lock, Azure Immutable Blob, WORM drives) for long-term retention. Once replicated to write-once storage, the hash chain serves as a verifiable index: you can confirm that the data in cold storage matches what was originally logged.

Visualised in dashboards for operational monitoring. Track invocations per tool, approval rates, privacy filter hit rates, and execution latencies. Identify tools that consistently trigger high sensitivity scores or that produce output requiring PII redaction.

The Operational Discipline

Audit logging is not a set-and-forget control. It requires the same operational discipline as any other security mechanism:

Verify the chain periodically. Run automated chain verification on a schedule (hourly, daily) and alert if the chain is broken. A broken chain may indicate tampering, data corruption, or a bug in the logging pipeline.

Retain logs for the required duration. Regulatory requirements vary (GDPR: as long as necessary for the processing purpose; HIPAA: 6 years minimum; SOX: 7 years). Define a retention policy and ensure the logging pipeline enforces it.

Protect log integrity. The hash chain detects tampering but does not prevent it. Use access controls, write-once storage, and replication to reduce the risk that log entries can be modified at all.

Monitor for anomalies. An unusually high number of Denied policy decisions, a spike in privacy_filters_applied matches, or a tool whose content_hash changes unexpectedly are all signals that warrant investigation.

The audit log is not just a record of what happened. It is, over time, the evidence base that allows an organisation to extend agent autonomy with confidence. As the log demonstrates that tools behave as expected, that permissions are respected, and that PII is consistently filtered, the case for granting agents more latitude becomes evidence-based rather than faith-based.

For the security operations procedures that complement audit logging, see the Security Operations documentation. For the governance framework that audit logs support, read The Enterprise Case for AI Tool Governance.