Skip to content

Custom Compatible

Connect PRX to any LLM API that follows the OpenAI Chat Completions format. Works with LiteLLM, vLLM, Groq, Mistral, xAI, Venice, Vercel AI, Cloudflare AI, HuggingFace Inference, and any other OpenAI-compatible service.

Prerequisites

  • A running LLM API that implements the OpenAI Chat Completions format (/v1/chat/completions or /chat/completions)
  • An API key (if required by the service)

Quick Setup

1. Identify Your Endpoint

Determine the base URL and authentication method for your API. For example:

  • Groq: https://api.groq.com/openai/v1
  • Mistral: https://api.mistral.ai/v1
  • xAI: https://api.x.ai/v1
  • Local vLLM: http://localhost:8000/v1
  • LiteLLM proxy: http://localhost:4000

2. Configure

toml
[default]
provider = "compatible"
model = "your-model-name"

[providers.compatible]
api_key = "${YOUR_API_KEY}"
api_url = "https://api.your-provider.com/v1"

3. Verify

bash
prx doctor models

Built-in Compatible Providers

PRX includes pre-configured aliases for popular OpenAI-compatible services:

Provider NameAliasesBase URLAuth Style
Venicevenicehttps://api.venice.aiBearer
Vercel AIvercel, vercel-aihttps://api.vercel.aiBearer
Cloudflare AIcloudflare, cloudflare-aihttps://gateway.ai.cloudflare.com/v1Bearer
Groqgroqhttps://api.groq.com/openai/v1Bearer
Mistralmistralhttps://api.mistral.ai/v1Bearer
xAIxai, grokhttps://api.x.ai/v1Bearer
Qianfanqianfan, baiduhttps://aip.baidubce.comBearer
Syntheticsynthetichttps://api.synthetic.comBearer
OpenCode Zenopencode, opencode-zenhttps://opencode.ai/zen/v1Bearer
LiteLLMlitellm, lite-llmconfigurableBearer
vLLMvllm, v-llmconfigurableBearer
HuggingFacehuggingface, hfconfigurableBearer

Configuration Reference

FieldTypeDefaultDescription
api_keystringoptionalAPI authentication key
api_urlstringrequiredBase URL of the API endpoint
modelstringrequiredModel name/ID to use
auth_stylestring"bearer"Authentication header style (see below)

Authentication Styles

StyleHeader FormatUsage
bearerAuthorization: Bearer <key>Most providers (default)
x-api-keyx-api-key: <key>Some Chinese providers
customCustom header nameSpecial cases

Features

Automatic Endpoint Detection

PRX automatically appends /chat/completions to your base URL. You do not need to include the endpoint path:

toml
# Correct - PRX appends /chat/completions
api_url = "https://api.groq.com/openai/v1"

# Also correct - explicit path works too
api_url = "https://api.groq.com/openai/v1/chat/completions"

Responses API Fallback

For providers that support OpenAI's newer Responses API, PRX can fall back to /v1/responses when /v1/chat/completions returns a 404. This is enabled by default but can be disabled for providers that do not support it (e.g., GLM/Zhipu).

Native Tool Calling

Tools are sent in OpenAI's standard function-calling format:

json
{
  "type": "function",
  "function": {
    "name": "tool_name",
    "description": "Tool description",
    "parameters": { "type": "object", "properties": {...} }
  }
}

The provider supports tool_choice: "auto" and properly deserializes structured tool_calls responses.

Vision Support

For vision-capable models, images embedded in messages as [IMAGE:data:image/png;base64,...] markers are automatically converted to the OpenAI vision format with image_url content blocks.

Streaming Support

The compatible provider supports SSE streaming for real-time token delivery. Stream events are parsed incrementally with support for:

  • delta.content text chunks
  • delta.tool_calls for incremental tool call construction
  • [DONE] marker detection
  • Graceful timeout handling

System Message Merging

Some providers (e.g., MiniMax) reject role: system messages. PRX can automatically merge system message content into the first user message. This is enabled by default for known incompatible providers.

HTTP/1.1 Forced Mode

Some providers (notably DashScope/Qwen) require HTTP/1.1 instead of HTTP/2. PRX automatically detects these endpoints and forces HTTP/1.1 for connection reliability.

Reasoning Content Fallback

For reasoning models that return output in reasoning_content instead of content, PRX automatically falls back to extract the reasoning text.

Advanced Configuration

Local LLM Server (vLLM, llama.cpp, etc.)

toml
[default]
provider = "compatible"
model = "meta-llama/Llama-3.1-8B-Instruct"

[providers.compatible]
api_url = "http://localhost:8000/v1"
# No api_key needed for local servers

LiteLLM Proxy

toml
[default]
provider = "litellm"
model = "gpt-4o"

[providers.litellm]
api_key = "${LITELLM_API_KEY}"
api_url = "http://localhost:4000"

Multiple Custom Providers

Use the model router to configure multiple compatible providers:

toml
[default]
provider = "openrouter"
model = "anthropic/claude-sonnet-4"

[[model_routes]]
pattern = "groq/*"
provider = "compatible"
api_url = "https://api.groq.com/openai/v1"
api_key = "${GROQ_API_KEY}"

[[model_routes]]
pattern = "mistral/*"
provider = "compatible"
api_url = "https://api.mistral.ai/v1"
api_key = "${MISTRAL_API_KEY}"

Troubleshooting

Connection refused

Ensure the API endpoint is reachable:

bash
curl -v https://api.your-provider.com/v1/models

401 Unauthorized

  • Verify your API key is correct
  • Check the authentication style matches your provider (Bearer vs x-api-key)
  • Some providers require additional headers; use a named provider alias if available

"role: system" rejected

If your provider does not support system messages, PRX should handle this automatically for known providers. For custom endpoints, this is a provider limitation. Workaround: include system instructions in the first user message.

Streaming not working

Not all OpenAI-compatible APIs support streaming. If streaming fails, PRX falls back to non-streaming mode automatically.

Model not found

Verify the exact model name/ID your provider expects. Different providers use different naming conventions:

  • Groq: llama-3.3-70b-versatile
  • Mistral: mistral-large-latest
  • xAI: grok-2

Check your provider's documentation for the correct model identifiers.

Released under the Apache-2.0 License.