Skip to content

Threat Model

This page documents the PRX threat model -- the set of threats we consider, our security assumptions, and the mitigations in place.

Threat Categories

1. Prompt Injection

Threat: Adversarial content in user input or retrieved data manipulates the agent into performing unintended actions.

Mitigations:

  • Tool call approval workflow
  • Policy engine restricts available actions
  • Input sanitization for known injection patterns

2. Tool Abuse

Threat: The agent uses tools in unintended ways (e.g., reading sensitive files, making unauthorized network requests).

Mitigations:

  • Sandbox isolation for tool execution
  • Policy engine with deny-by-default rules
  • Per-tool rate limiting
  • Audit logging of all tool calls

3. Data Exfiltration

Threat: Sensitive data from the local system is sent to external services via LLM context or tool calls.

Mitigations:

  • Network allowlisting in sandbox
  • Content filtering for sensitive patterns (API keys, passwords)
  • Policy rules restricting data flow

4. Supply Chain

Threat: Malicious plugins or dependencies compromise the agent.

Mitigations:

  • WASM sandbox for plugins
  • Plugin permission manifests
  • Dependency auditing (cargo audit)

Security Assumptions

  • The host operating system is trusted
  • LLM providers handle API keys securely
  • The user is responsible for reviewing agent actions when approval is required

Reporting Vulnerabilities

If you discover a security vulnerability, please report it to [email protected].

Released under the Apache-2.0 License.