Custom Compatible

Connect PRX to any LLM API that follows the OpenAI Chat Completions format. Works with LiteLLM, vLLM, Groq, Mistral, xAI, Venice, Vercel AI, Cloudflare AI, HuggingFace Inference, and any other OpenAI-compatible service.

Prerequisites

A running LLM API that implements the OpenAI Chat Completions format (/v1/chat/completions or /chat/completions)
An API key (if required by the service)

Quick Setup

1. Identify Your Endpoint

Determine the base URL and authentication method for your API. For example:

Groq: https://api.groq.com/openai/v1
Mistral: https://api.mistral.ai/v1
xAI: https://api.x.ai/v1
Local vLLM: http://localhost:8000/v1
LiteLLM proxy: http://localhost:4000

2. Configure

toml

[default]
provider = "compatible"
model = "your-model-name"

[providers.compatible]
api_key = "${YOUR_API_KEY}"
api_url = "https://api.your-provider.com/v1"

3. Verify

bash

prx doctor models

Built-in Compatible Providers

PRX includes pre-configured aliases for popular OpenAI-compatible services:

Provider Name	Aliases	Base URL	Auth Style
Venice	`venice`	`https://api.venice.ai`	Bearer
Vercel AI	`vercel`, `vercel-ai`	`https://api.vercel.ai`	Bearer
Cloudflare AI	`cloudflare`, `cloudflare-ai`	`https://gateway.ai.cloudflare.com/v1`	Bearer
Groq	`groq`	`https://api.groq.com/openai/v1`	Bearer
Mistral	`mistral`	`https://api.mistral.ai/v1`	Bearer
xAI	`xai`, `grok`	`https://api.x.ai/v1`	Bearer
Qianfan	`qianfan`, `baidu`	`https://aip.baidubce.com`	Bearer
Synthetic	`synthetic`	`https://api.synthetic.com`	Bearer
OpenCode Zen	`opencode`, `opencode-zen`	`https://opencode.ai/zen/v1`	Bearer
LiteLLM	`litellm`, `lite-llm`	configurable	Bearer
vLLM	`vllm`, `v-llm`	configurable	Bearer
HuggingFace	`huggingface`, `hf`	configurable	Bearer

Configuration Reference

Field	Type	Default	Description
`api_key`	string	optional	API authentication key
`api_url`	string	required	Base URL of the API endpoint
`model`	string	required	Model name/ID to use
`auth_style`	string	`"bearer"`	Authentication header style (see below)

Authentication Styles

Style	Header Format	Usage
`bearer`	`Authorization: Bearer <key>`	Most providers (default)
`x-api-key`	`x-api-key: <key>`	Some Chinese providers
`custom`	Custom header name	Special cases

Features

Automatic Endpoint Detection

PRX automatically appends /chat/completions to your base URL. You do not need to include the endpoint path:

toml

# Correct - PRX appends /chat/completions
api_url = "https://api.groq.com/openai/v1"

# Also correct - explicit path works too
api_url = "https://api.groq.com/openai/v1/chat/completions"

Responses API Fallback

For providers that support OpenAI's newer Responses API, PRX can fall back to /v1/responses when /v1/chat/completions returns a 404. This is enabled by default but can be disabled for providers that do not support it (e.g., GLM/Zhipu).

Native Tool Calling

Tools are sent in OpenAI's standard function-calling format:

json

{
  "type": "function",
  "function": {
    "name": "tool_name",
    "description": "Tool description",
    "parameters": { "type": "object", "properties": {...} }
  }
}

The provider supports tool_choice: "auto" and properly deserializes structured tool_calls responses.

Vision Support

For vision-capable models, images embedded in messages as [IMAGE:data:image/png;base64,...] markers are automatically converted to the OpenAI vision format with image_url content blocks.

Streaming Support

The compatible provider supports SSE streaming for real-time token delivery. Stream events are parsed incrementally with support for:

delta.content text chunks
delta.tool_calls for incremental tool call construction
[DONE] marker detection
Graceful timeout handling

System Message Merging

Some providers (e.g., MiniMax) reject role: system messages. PRX can automatically merge system message content into the first user message. This is enabled by default for known incompatible providers.

HTTP/1.1 Forced Mode

Some providers (notably DashScope/Qwen) require HTTP/1.1 instead of HTTP/2. PRX automatically detects these endpoints and forces HTTP/1.1 for connection reliability.

Reasoning Content Fallback

For reasoning models that return output in reasoning_content instead of content, PRX automatically falls back to extract the reasoning text.

Advanced Configuration

Local LLM Server (vLLM, llama.cpp, etc.)

toml

[default]
provider = "compatible"
model = "meta-llama/Llama-3.1-8B-Instruct"

[providers.compatible]
api_url = "http://localhost:8000/v1"
# No api_key needed for local servers

LiteLLM Proxy

toml

[default]
provider = "litellm"
model = "gpt-4o"

[providers.litellm]
api_key = "${LITELLM_API_KEY}"
api_url = "http://localhost:4000"

Multiple Custom Providers

Use the model router to configure multiple compatible providers:

toml

[default]
provider = "openrouter"
model = "anthropic/claude-sonnet-4"

[[model_routes]]
pattern = "groq/*"
provider = "compatible"
api_url = "https://api.groq.com/openai/v1"
api_key = "${GROQ_API_KEY}"

[[model_routes]]
pattern = "mistral/*"
provider = "compatible"
api_url = "https://api.mistral.ai/v1"
api_key = "${MISTRAL_API_KEY}"

Troubleshooting

Connection refused

Ensure the API endpoint is reachable:

bash

curl -v https://api.your-provider.com/v1/models

401 Unauthorized

Verify your API key is correct
Check the authentication style matches your provider (Bearer vs x-api-key)
Some providers require additional headers; use a named provider alias if available

"role: system" rejected

If your provider does not support system messages, PRX should handle this automatically for known providers. For custom endpoints, this is a provider limitation. Workaround: include system instructions in the first user message.

Streaming not working

Not all OpenAI-compatible APIs support streaming. If streaming fails, PRX falls back to non-streaming mode automatically.

Model not found

Verify the exact model name/ID your provider expects. Different providers use different naming conventions:

Groq: llama-3.3-70b-versatile
Mistral: mistral-large-latest
xAI: grok-2

Check your provider's documentation for the correct model identifiers.

Custom Compatible ​

Prerequisites ​

Quick Setup ​

1. Identify Your Endpoint ​

2. Configure ​

3. Verify ​

Built-in Compatible Providers ​

Configuration Reference ​

Authentication Styles ​

Features ​

Automatic Endpoint Detection ​

Responses API Fallback ​

Native Tool Calling ​

Vision Support ​

Streaming Support ​

System Message Merging ​

HTTP/1.1 Forced Mode ​

Reasoning Content Fallback ​

Advanced Configuration ​

Local LLM Server (vLLM, llama.cpp, etc.) ​

LiteLLM Proxy ​

Multiple Custom Providers ​

Troubleshooting ​

Connection refused ​

401 Unauthorized ​

"role: system" rejected ​

Streaming not working ​

Model not found ​

Custom Compatible

Prerequisites

Quick Setup

1. Identify Your Endpoint

2. Configure

3. Verify

Built-in Compatible Providers

Configuration Reference

Authentication Styles

Features

Automatic Endpoint Detection

Responses API Fallback

Native Tool Calling

Vision Support

Streaming Support

System Message Merging

HTTP/1.1 Forced Mode

Reasoning Content Fallback

Advanced Configuration

Local LLM Server (vLLM, llama.cpp, etc.)

LiteLLM Proxy

Multiple Custom Providers

Troubleshooting

Connection refused

401 Unauthorized

"role: system" rejected

Streaming not working

Model not found