Back|Reference Tools
Reference Tool · v0.1.4

Privacy-Scanner

Overview

Privacy-Scanner is a PII and sensitive-data detection tool that scans structured and unstructured content against a declarative pattern manifest. It demonstrates how MPP enables data-processing tools with zero retention — content streams through the WASM sandbox, pattern matches are emitted, and no input data is ever stored or exfiltrated.

Detect sensitive data in agent workflows — without the data ever leaving the sandbox.

Manifest

[package]
name        = "privacy-scanner"
version     = "0.1.4"
description = "PII and sensitive-data detection tool"
authors     = ["MPP Reference Team"]
license     = "Apache-2.0"

[runtime]
target = "wasm32-wasi"
memory = "32MB"

[permissions]
input   = "read-stream"
output  = "pattern-match-only"
network = "deny"
fs      = "deny"

[permissions.patterns]
builtin = [
  "email", "phone", "ssn", "credit-card",
  "ip-address", "date-of-birth", "passport",
  "iban", "drivers-license"
]
custom  = "allowed"

[signing]
algorithm = "Ed25519"
key_id    = "mpp-reference-2025"

Architecture

Privacy-Scanner processes input as a read-only stream. The WASM module applies compiled pattern matchers against each chunk and emits structured match results. The pattern-match-only output permission ensures the tool can only return match metadata (type, location, confidence) — never the matched content itself.

Execution Flow

  1. Agent Request: The host AI agent sends text content (or a reference to a data source) to the tool via the MPP invoke interface, along with the set of patterns to scan for.
  2. Stream Ingestion: Content is streamed into the WASM module in chunks. No chunk is retained after processing — memory is zeroed between chunks.
  3. Pattern Matching: Each chunk is evaluated against the selected pattern set (builtin and/or custom regex patterns). Matches are recorded as position + type + confidence tuples.
  4. Output Gate: The runtime's output filter enforces pattern-match-only — the tool can emit match metadata but cannot echo back the original content or the matched text.
  5. Response: A structured report of detected patterns, their locations, and confidence scores is returned via the MPP result channel.

Security Boundaries

LayerControl
WASM sandboxLinear memory isolation — input data cannot escape the module boundary
Input: read-streamContent is processed as a forward-only stream; no random access or re-reads
Output: pattern-match-onlyResults contain match type and location, never the matched PII content itself
Network denyNo outbound connections — detected PII cannot be exfiltrated
FS denyNo file-system access — zero data persistence between invocations
Ed25519 signaturePackage integrity is verified before any code is loaded

Permissions Detail

  • input — read-stream: The tool receives content as a forward-only byte stream. It cannot request specific offsets, re-read previous chunks, or buffer the entire input.
  • output — pattern-match-only: The tool's output is filtered by the runtime to ensure it contains only match metadata (pattern type, byte offset, line number, confidence score). Any attempt to include raw matched text is stripped.
  • network — deny: No outbound connections of any kind. Even if the pattern-match-only gate were bypassed, there is no channel to send data out.
  • fs — deny: No file-system access. The tool is stateless between invocations.

Builtin Patterns

PatternDetectsExample Format
emailEmail addressesuser@example.com
phonePhone numbers (international)+1-555-123-4567
ssnUS Social Security Numbers123-45-6789
credit-cardCredit/debit card numbers (Luhn-validated)4111-1111-1111-1111
ip-addressIPv4 and IPv6 addresses192.168.1.1
date-of-birthDate patterns in PII context1990-01-15
passportPassport numbers (multi-country)AB1234567
ibanInternational Bank Account NumbersGB29 NWBK 6016 1331 9268 19
drivers-licenseDriver's license numbers (US states)D123-4567-8901

Usage Example

# Install from registry
mpp install privacy-scanner@0.1.4

# Verify signature before first run
mpp verify privacy-scanner
# ✓ Ed25519 signature valid (key: mpp-reference-2025)
# ✓ Manifest hash matches archive
# ✓ Permissions: input(read-stream), output(pattern-match-only), network(deny), fs(deny)

# Scan text content for PII
mpp run privacy-scanner --input '{
  "content": "Please contact John at john.doe@company.com or 555-123-4567.",
  "patterns": ["email", "phone", "ssn"]
}'

# Example response (note: matched text is NOT included)
{
  "status": "ok",
  "matches": [
    {
      "pattern": "email",
      "line": 1,
      "offset": 28,
      "length": 20,
      "confidence": 0.99
    },
    {
      "pattern": "phone",
      "line": 1,
      "offset": 52,
      "length": 12,
      "confidence": 0.95
    }
  ],
  "summary": {
    "total_matches": 2,
    "patterns_found": ["email", "phone"],
    "patterns_clean": ["ssn"],
    "bytes_scanned": 63,
    "elapsed_ms": 2
  }
}

Custom Patterns

In addition to the builtin set, you can define custom regex patterns in the input:

mpp run privacy-scanner --input '{
  "content": "Employee ID: EMP-2025-00142, Project: ATLAS-7",
  "patterns": ["email"],
  "custom_patterns": [
    {
      "name": "employee-id",
      "regex": "EMP-\\d{4}-\\d{5}",
      "description": "Internal employee identifier"
    }
  ]
}'

Threat Mitigations

  • PII Leakage via Output: The pattern-match-only output gate ensures matched text is never included in results. Only position and type metadata are returned.
  • Data Retention: WASM linear memory is zeroed between stream chunks and wiped on invocation end. No data persists.
  • Data Exfiltration: Network and file-system access are both denied. Detected PII has no path out of the sandbox.
  • Pattern Injection: Custom patterns are compiled in a restricted regex engine with no backtracking exploits (RE2-compatible). Catastrophic backtracking is impossible.
  • Supply-Chain Tampering: The Ed25519 signature covers the entire archive. Any modification invalidates the signature and prevents execution.

Source & Build

# Clone the reference repo
git clone https://github.com/mpp-protocol/reference-tools.git
cd reference-tools/privacy-scanner

# Build the WASM module
cargo build --target wasm32-wasi --release

# Package as .mpp artifact
mpp pack --sign --key ~/.mpp/keys/mpp-reference-2025.key

The resulting privacy-scanner-0.1.4.mpp artifact can be published to any MPP-compatible registry or shared directly as a signed file.