Privacy-Scanner
Overview
Privacy-Scanner is a PII and sensitive-data detection tool that scans structured and unstructured content against a declarative pattern manifest. It demonstrates how MPP enables data-processing tools with zero retention — content streams through the WASM sandbox, pattern matches are emitted, and no input data is ever stored or exfiltrated.
Detect sensitive data in agent workflows — without the data ever leaving the sandbox.
Manifest
[package]
name = "privacy-scanner"
version = "0.1.4"
description = "PII and sensitive-data detection tool"
authors = ["MPP Reference Team"]
license = "Apache-2.0"
[runtime]
target = "wasm32-wasi"
memory = "32MB"
[permissions]
input = "read-stream"
output = "pattern-match-only"
network = "deny"
fs = "deny"
[permissions.patterns]
builtin = [
"email", "phone", "ssn", "credit-card",
"ip-address", "date-of-birth", "passport",
"iban", "drivers-license"
]
custom = "allowed"
[signing]
algorithm = "Ed25519"
key_id = "mpp-reference-2025"Architecture
Privacy-Scanner processes input as a read-only stream. The WASM module applies compiled pattern matchers against each chunk and emits structured match results. The pattern-match-only output permission ensures the tool can only return match metadata (type, location, confidence) — never the matched content itself.
Execution Flow
- Agent Request: The host AI agent sends text content (or a reference to a data source) to the tool via the MPP invoke interface, along with the set of patterns to scan for.
- Stream Ingestion: Content is streamed into the WASM module in chunks. No chunk is retained after processing — memory is zeroed between chunks.
- Pattern Matching: Each chunk is evaluated against the selected pattern set (builtin and/or custom regex patterns). Matches are recorded as position + type + confidence tuples.
- Output Gate: The runtime's output filter enforces
pattern-match-only— the tool can emit match metadata but cannot echo back the original content or the matched text. - Response: A structured report of detected patterns, their locations, and confidence scores is returned via the MPP result channel.
Security Boundaries
| Layer | Control |
|---|---|
| WASM sandbox | Linear memory isolation — input data cannot escape the module boundary |
| Input: read-stream | Content is processed as a forward-only stream; no random access or re-reads |
| Output: pattern-match-only | Results contain match type and location, never the matched PII content itself |
| Network deny | No outbound connections — detected PII cannot be exfiltrated |
| FS deny | No file-system access — zero data persistence between invocations |
| Ed25519 signature | Package integrity is verified before any code is loaded |
Permissions Detail
- input — read-stream: The tool receives content as a forward-only byte stream. It cannot request specific offsets, re-read previous chunks, or buffer the entire input.
- output — pattern-match-only: The tool's output is filtered by the runtime to ensure it contains only match metadata (pattern type, byte offset, line number, confidence score). Any attempt to include raw matched text is stripped.
- network — deny: No outbound connections of any kind. Even if the pattern-match-only gate were bypassed, there is no channel to send data out.
- fs — deny: No file-system access. The tool is stateless between invocations.
Builtin Patterns
| Pattern | Detects | Example Format |
|---|---|---|
email | Email addresses | user@example.com |
phone | Phone numbers (international) | +1-555-123-4567 |
ssn | US Social Security Numbers | 123-45-6789 |
credit-card | Credit/debit card numbers (Luhn-validated) | 4111-1111-1111-1111 |
ip-address | IPv4 and IPv6 addresses | 192.168.1.1 |
date-of-birth | Date patterns in PII context | 1990-01-15 |
passport | Passport numbers (multi-country) | AB1234567 |
iban | International Bank Account Numbers | GB29 NWBK 6016 1331 9268 19 |
drivers-license | Driver's license numbers (US states) | D123-4567-8901 |
Usage Example
# Install from registry
mpp install privacy-scanner@0.1.4
# Verify signature before first run
mpp verify privacy-scanner
# ✓ Ed25519 signature valid (key: mpp-reference-2025)
# ✓ Manifest hash matches archive
# ✓ Permissions: input(read-stream), output(pattern-match-only), network(deny), fs(deny)
# Scan text content for PII
mpp run privacy-scanner --input '{
"content": "Please contact John at john.doe@company.com or 555-123-4567.",
"patterns": ["email", "phone", "ssn"]
}'
# Example response (note: matched text is NOT included)
{
"status": "ok",
"matches": [
{
"pattern": "email",
"line": 1,
"offset": 28,
"length": 20,
"confidence": 0.99
},
{
"pattern": "phone",
"line": 1,
"offset": 52,
"length": 12,
"confidence": 0.95
}
],
"summary": {
"total_matches": 2,
"patterns_found": ["email", "phone"],
"patterns_clean": ["ssn"],
"bytes_scanned": 63,
"elapsed_ms": 2
}
}Custom Patterns
In addition to the builtin set, you can define custom regex patterns in the input:
mpp run privacy-scanner --input '{
"content": "Employee ID: EMP-2025-00142, Project: ATLAS-7",
"patterns": ["email"],
"custom_patterns": [
{
"name": "employee-id",
"regex": "EMP-\\d{4}-\\d{5}",
"description": "Internal employee identifier"
}
]
}'Threat Mitigations
- PII Leakage via Output: The
pattern-match-onlyoutput gate ensures matched text is never included in results. Only position and type metadata are returned. - Data Retention: WASM linear memory is zeroed between stream chunks and wiped on invocation end. No data persists.
- Data Exfiltration: Network and file-system access are both denied. Detected PII has no path out of the sandbox.
- Pattern Injection: Custom patterns are compiled in a restricted regex engine with no backtracking exploits (RE2-compatible). Catastrophic backtracking is impossible.
- Supply-Chain Tampering: The Ed25519 signature covers the entire archive. Any modification invalidates the signature and prevents execution.
Source & Build
# Clone the reference repo
git clone https://github.com/mpp-protocol/reference-tools.git
cd reference-tools/privacy-scanner
# Build the WASM module
cargo build --target wasm32-wasi --release
# Package as .mpp artifact
mpp pack --sign --key ~/.mpp/keys/mpp-reference-2025.keyThe resulting privacy-scanner-0.1.4.mpp artifact can be published to any MPP-compatible registry or shared directly as a signed file.