Agent Loop
The agent loop is the central execution cycle that drives every PRX agent session. Each iteration processes an LLM response, dispatches tool calls, manages memory, and decides whether to continue or return a final answer.
Loop Lifecycle
User Message
│
▼
┌─────────────┐
│ Build Context│──── Memory Recall
└──────┬──────┘
▼
┌─────────────┐
│ LLM Inference│──── Streaming Response
└──────┬──────┘
▼
┌─────────────┐
│ Parse Output │──── Tool Calls / Text
└──────┬──────┘
▼
Tool Calls?
├── Yes ──→ Execute Tools ──→ Loop Again
└── No ──→ Return ResponseTool Dispatch
When the LLM response contains tool calls, the loop:
- Validates each tool call against the security policy
- Executes approved calls (potentially in parallel)
- Collects results and feeds them back to the LLM
- Continues the loop for the next inference step
Streaming
PRX streams LLM responses token-by-token to the client while simultaneously buffering for tool-call detection. The streaming pipeline supports:
- Real-time token forwarding to CLI or WebSocket clients
- Backpressure handling when the client is slow
- Graceful cancellation via Ctrl+C or API signals
Memory Recall
Before each LLM call, the loop retrieves relevant context from the memory system:
- Recent conversation turns (sliding window)
- Semantic search results from the embedding store
- Pinned facts and user preferences
Context Compaction
When the conversation exceeds the model's context window, the loop triggers compaction:
- Summarize older turns into a condensed representation
- Preserve tool call results that are still referenced
- Maintain the system prompt and pinned memories intact
Configuration
toml
[agent.loop]
max_iterations = 50
parallel_tool_calls = true
compaction_threshold_tokens = 80000
compaction_strategy = "summarize" # or "truncate"Related Pages
- Agent Runtime -- Architecture overview
- Sub-agents -- Child agent spawning
- Memory System -- Memory backends and recall