Skip to content
This page was generated and translated with the assistance of AI. If you spot any inaccuracies, feel free to help improve it. Edit on GitHub

Ollama

Run LLMs locally or on self-hosted infrastructure with Ollama. Supports vision, native tool calling, reasoning models, and optional cloud routing via Ollama Cloud.

Prerequisites

  • Ollama installed and running locally, or
  • A remote Ollama instance with network access

Quick Setup

1. Install Ollama

bash
# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

# Start the server
ollama serve

2. Pull a Model

bash
ollama pull qwen3

3. Configure

toml
[default]
provider = "ollama"
model = "qwen3"

No API key is required for local usage.

4. Verify

bash
prx doctor models

Available Models

Any model available through Ollama can be used. Popular choices include:

ModelParametersVisionTool UseNotes
qwen38BNoYesExcellent multilingual coding model
qwen2.5-coder7BNoYesSpecialized for code
llama3.18B/70B/405BNoYesMeta's open model family
mistral-nemo12BNoYesEfficient reasoning
deepseek-r17B/14B/32BNoYesReasoning model
llava7B/13BYesNoVision + language
gemma29B/27BNoYesGoogle's open model
codellama7B/13B/34BNoNoCode-specialized Llama

Run ollama list to see your installed models.

Configuration Reference

FieldTypeDefaultDescription
api_keystringoptionalAPI key for remote/cloud Ollama instances
api_urlstringhttp://localhost:11434Ollama server base URL
modelstringrequiredModel name (e.g., qwen3, llama3.1:70b)
reasoningbooloptionalEnable think mode for reasoning models

Features

Zero Configuration for Local Use

When running Ollama locally, no API key or special configuration is needed. PRX automatically connects to http://localhost:11434.

Native Tool Calling

PRX uses Ollama's native /api/chat tool-calling support. Tool definitions are sent in the request body and structured tool_calls are returned by compatible models (qwen2.5, llama3.1, mistral-nemo, etc.).

PRX also handles quirky model behaviors:

  • Nested tool calls: {"name": "tool_call", "arguments": {"name": "shell", ...}} are automatically unwrapped
  • Prefixed names: tool.shell is normalized to shell
  • Tool result mapping: Tool call IDs are tracked and mapped to tool_name fields in follow-up tool result messages

Vision Support

Vision-capable models (e.g., LLaVA) receive images via Ollama's native images field. PRX automatically extracts base64 image data from [IMAGE:...] markers and sends them as separate image entries.

Reasoning Mode

For reasoning models (QwQ, DeepSeek-R1, etc.), enable the think parameter:

toml
[providers.ollama]
reasoning = true

This sends "think": true in the request, enabling the model's internal reasoning process. If the model returns only a thinking field with empty content, PRX provides a graceful fallback message.

Remote and Cloud Instances

To connect to a remote Ollama server:

toml
[providers.ollama]
api_url = "https://my-ollama-server.example.com:11434"
api_key = "${OLLAMA_API_KEY}"

Authentication is only sent for non-local endpoints (when the host is not localhost, 127.0.0.1, or ::1).

Cloud Routing

Append :cloud to a model name to force routing through a remote Ollama instance:

bash
prx chat --model "qwen3:cloud"

Cloud routing requires:

  • A non-local api_url
  • An api_key configured

Extended Timeout

Ollama requests use a 300-second timeout (compared to 120 seconds for cloud providers), accounting for the potentially slower inference on local hardware.

Troubleshooting

"Is Ollama running?"

The most common error. Solutions:

  • Start the server: ollama serve
  • Check if the port is accessible: curl http://localhost:11434
  • If using a custom port, update api_url in your config

Model not found

Pull the model first:

bash
ollama pull qwen3

Empty responses

Some reasoning models may return only thinking content without a final response. This usually means the model stopped prematurely. Try:

  • Sending the request again
  • Using a different model
  • Disabling reasoning mode if the model does not support it well

Tool calls not working

Not all Ollama models support tool calling. Models known to work well:

  • qwen2.5 / qwen3
  • llama3.1
  • mistral-nemo
  • command-r

Cloud routing errors

  • "requested cloud routing, but Ollama endpoint is local": Set api_url to a remote server
  • "requested cloud routing, but no API key is configured": Set api_key or OLLAMA_API_KEY

Released under the Apache-2.0 License.